IBM InfoSphere DataStage - Import - 7.1

Talend Data Catalog Bridges

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Data Catalog

Bridge Specifications

Vendor IBM
Tool Name InfoSphere DataStage
Tool Version 7.5 to 11.x
Tool Web Site http://www.ibm.com/software/data/infosphere/datastage/
Supported Methodology [Data Integration] Multi-Model, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via XML and DSX File
Data Profiling
Multi-Model Harvesting
Incremental Harvesting
Remote Repository Browsing for Model Selection

BRIDGE INFORMATION
Import tool: IBM InfoSphere DataStage 7.5 to 11.x (http://www.ibm.com/software/data/infosphere/datastage/)
Import interface: [Data Integration] Multi-Model, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via XML and DSX File from IBM InfoSphere DataStage
Import bridge: 'AscentialDataStage' 10.1.0

BRIDGE DOCUMENTATION
This bridge reads a DSX or XML file generated by DataStage.

Please note, some models contain special characters which require encoding in order to transfer correctly. When exported from DataStage, often the XML format has problems, while the DSX provides the correct encoding information. The problem is likely to manifest as XML parsing errors. Export DSX files from DataStage to avoid these errors.


Bridge Parameters

Parameter Name Description Type Values Default Scope
File DataStage provides 2 export formats, XML and DSX. This import bridge supports both, though there are sometimes problems with XML. To create one of these files:
- Open a project in the Designer Client
- Select the job folder you wish to export as a model
- Select the Export menu
- Select the DataStage Components... menu item
If a DataStage XML file produces a 'Failed to parse' error, this is usually due to an encoding error in the XML file. To work around this problem, export the model to a DSX file.
FILE
*.dsx
*.xml
  Mandatory
Import scope This parameter is used to limit the scope of metadata imported from the XML file selections.

Options include:
- Table definitions only
- Sequences and standalone jobs, including table definitions that are used in them
- All objects, all that are present in the selected XML file.

Note that for nearly all applications, including the analysis of lineage and impacts, this parameter should be set to sequences and standalone jobs, as this setting will import all sources, destinations, and transformations present in the DataStage file.
ENUMERATED
Table definitions only
Sequences and standalone jobs
All supported objects
Sequences and standalone jobs  
Variable values file File defining list of DataStage variable values in format:
variable1_name=variable1_value
variable2_name=variable2_value
...
variableN_name=variableN_value

DataStage uses substitution variables within many of the parameters that may be specified within jobs and connections. In many cases, the values which must be assigned to these variables are not provided by the XML file that this bridge parses. In such cases, the bridge will report a warning in the log saying that it 'could not determine the value of a variable' and will simply leave that variable name without any substitution in the resulting model. In order to determine the correct substitution values for these variables, the bridge reads a variable value file with the variable names and concludes with the correct value to substitute.

Place the pathname of a variable value file in this parameter.

If you have DataStage parameter sets place their directory's path in this parameter. DataStage store parameter sets into predefined directory structure:

ParameterSets
...ParameterSetName1
......ValuesFileName
...ParameterSetName2
......ValuesFileName

Typically, variable names that refer to enviornment variables start with a dollar sign ('$'). Variable names are not case sensitive and trimmed of all leading and trailing spaces.

If a variable is found and its value is not found in the in the variable value file, or there is no variable value file specified here, the bridge will try to use operating system environment variables (at the time the bridge executes) to resolve the value.
The parameter file could be used to specify system type for any connection name. Use format:
'CONNECTION.'<ConnectionName>=<SystemType>
<SystemType>:={'DB2/UDB'|'MICROSOFT SQL SERVER'|'ORACLE'|'TERADATA'|'SYBASE SQL SERVER'|'INFORMIX'|'HIVE'}
It would be usefull for ODBC connection if you know the real DB system behind such connection.
FILE *.*    
Miscellaneous Specify miscellaneous options identified with a -letter and optional value.
Use Statistics=<Statistics_file_path> to produce ETL import statistics, e.g.:
Statistics=C:\Temp\Stats.csv
-pppd create the connections and connection data sets in DI/ETL design models. This feature should only be used when intending to export to another DI/ETL tool.

-cd: split or merge file system connections by a directory path.

For example, a connection can have two root folders, a and b. When you imported separate File stores for each root folder you would want to split the connection into two connections that can be resolved using these File stores. It can be achieved with option:
-cd a_con=orig_con:/a
requesting to create 'a_con' connection and move the 'a' folder to it from the 'orig_con' connection. The result will have a_con and orig_con connections. The orig_con connection will have the folder branch 'b' that is left over after splinting the folder branch 'a' out.

Here is a little bit more complex example:
-cd a_con=orig_con:/root/a - create 'a_con' connection for the 'root/a' folder branch in the 'a_con' connection.

You can use the option to merge several connections into one. For example, when you have two file stores C:\a and B:\b you can merge them with the option:
-cd C:\=B:\
that will move all folders from B:\ connection to C:\ that will end up with a and b root folders.
STRING      

 

Bridge Mapping

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"IBM InfoSphere DataStage"
Metamodel
AscentialDataStage
Mapping Comments
     
Association ForeignKey  
AssociationRole ForeignKey  
Multiplicity   Based on Foreign Key information
Source   Based on Foreign Key information
AssociationRoleNameMap ForeignKey  
Attribute Column  
Description Description  
Name Name  
Optional Nullable  
BaseType Column, StageVariable  
DataType SQL Type See datatype conversion array
Name   Based on the datatype
PhysicalName   Derived from the datatype
CandidateKey Column  
Class Table Definition  
CppClassType   Set to ENTITY
CppPersistent   Set to True
Description Description  
Name Table name Class Name. Computed if not set
PhysicalName   Class Physical Name.Computed from the 'name' if not set
ClassifierMap Stage 1 for Stage Variables (if any), plus 1 per input or output pin
Name Name  
Operation Constraint, SqlRef, SqlPrimary, SqlInsert  
DataAttribute Column, Stage Variable  
Description Description  
Name Name  
DataSet Stage 1 for Stage Variables (if any), plus 1 per input or output pin
Name Name  
DatabaseSchema Table Definition  
Name Owner in record identifier
DerivedType Column, StageVariable  
DataType SQL Type See datatype conversion array
Length Precision  
Name SQL Type  
PhysicalName   Derived from the datatype
Scale Scale  
UserDefined   Set to False
DesignPackage Jobs Categories, Shared Continers Categories, Table Definitions Catagories  
Name Name, Source.DBDName from category
UserDefined   Set to False
FeatureMap Column, Stage Variable  
Name Name  
Operation Expression, Derivation, Initial Value  
ForeignKey ForeignKey  
StoreModel DataStage file  
Name   Based on the file name