Informatica PowerCenter (File) - Import - 7.1

Talend Data Catalog Bridges

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Data Catalog

Bridge Specifications

Vendor Informatica
Tool Name PowerCenter
Tool Version 8.x to 10.x
Tool Web Site http://www.informatica.com/products_services/powercenter/Pages/index.aspx
Supported Methodology [Data Integration] Multi-Model, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via XML File
Remote Repository Browsing for Model Selection
Data Profiling
Multi-Model Harvesting
Incremental Harvesting

BRIDGE INFORMATION
Import tool: Informatica PowerCenter 8.x to 10.x (http://www.informatica.com/products_services/powercenter/Pages/index.aspx)
Import interface: [Data Integration] Multi-Model, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via XML File from Informatica PowerCenter (File)
Import bridge: 'InformaticaPowerCenter' 10.1.0

BRIDGE DOCUMENTATION
This bridge imports Informatica PowerCenter metadata objects from an XML file exported from Informatica.


Note about completeness of the Metadata Import for Lineage Presenation.

When one asks for a lineage of a data warehouse column, one expects to get an answer reflecting the run-time reality of the whole ETL process (execution plan). The process can involve multiple source systems, multiple phases, and multiple increments.

In the case of Informatica, an ETL process executes workflows. A workflow can be executed many times with different parameters, but not every type of parameters affect lineage. Connectivity or SQL overwrite parameters that change sources or targets that affect lineage. Date parameters that control incremental loading do not affect lineage.

The Informatica repository stores workflows in folders. The Informatica folder structure can be aligned with execution process structure, but does not have to.

It is recommended that one should import Informatica metadata according to the ETL execution process. In order to do so, please consult with the data warehouse/ETL architect/administrator to find out a list of all workflow and corresponding parameters that make up the whole ETL process.

Included in this analysis may be:

- multiple source systems - a single-source execution plan extracts data from a single instance of a single source system. A multi-source execution plan extracts data from multiple instances of the same source system, (Homogeneous) or multiple instances of dissimilar source systems (Heterogeneous). For example, a business might have an instance of ERP in one location and time zone and another instance of the same ERP in another location and time zone (Homogeneous). Or a business might have an instance of CRM in one location, an instance of ERP in another location, and a second instance of the ERP in yet a third location (Heterogeneous).

- multiple phases - extraction, loading, post-proccesing

- multiple increments - full, incremental and micro loads


SUPPORT
Q: How do I provide metadata to the support team to reproduce an issue?

Create a backup of the Informatica PowerCenter metadata you are trying to import by using the pmrep command on the command line on a machine that has a full client installation of Informatica PowerCenter and is able to connect to the repository.

- Open a command line window on the app server and go to the directory where pmrep.exe is (same as for the 'Path to the Informatica binary files' parameter for the bridge)

- Enter the following commands:
- pmrep.exe
- connect -r Repo_name -n user_name -x password -h host -o port
where these variables match the entries in the bridge for the PowerCenter Repository connection
- backup -o c:\temp\InfaBackup.dat

The backup file is written to the Informatica PowerCenter Server and may be retrieved there.


Bridge Parameters

Parameter Name Description Type Values Default Scope
File The bridge uses an XML file generated using Informatica PowerCenter object export.

For example, to export one or more objects using PowerCenter Designer or Repository Manager into an XML file:
1. Start the PowerCenter tool.
2. Browse the repository and select the objects to be exported
3. Choose 'Export Objects...' from the 'Repository' menu.
4. Export the selected objects into an XML file.

For more details about exporting and importing objects, see 'Exporting and Importing Objects' in the Informatica Repository Guide.

This bridge will use the generated XML file as input.
FILE *.xml   Mandatory
Parameter files directory Directory containing Informatica PowerCenter parameters file(s). For detailed information about parameters files, please see the product documentation.


Informatica uses substitution parameters that are substituted with values at ETL execution (runtime). Just like Informatica itself, the bridge looks for these values in parameter files available to the bridge. This bridge parameter should be given a directory name, which is the root directory for all of these parameters files.

If the bridge cannot find a name=value pair for a given substitution parameter, the bridge will generally fail to parse the metadata correctly and report warnings and errors accordingly. It will also report the name of the undefined substitution parameter.
There are several ways in which to define substitution parameters and place the parameter files in the directory referred to here:
- If you are importing one workflow that uses a parameter file please name it 'parameters.prm' and place the file in the directory.

- If you are importing multiple workflows that re-use the same parameter file please name it 'parameters.prm' and place it in the directory.

- If you are importing multiple workflows that use different parameter files please place these files under the directory in sub-directories. Each parameter file has a name of the workflow that uses it (with extension '.prm') and is placed in a sub-directory that has a name of the workflow's repository folder.


Path Prefix:
You can use a special substitution parameter (look at the variable definition section below) '$Static_Directory_Prefix@@' to prefix any relative paths for your parameter files. For example, if your session refers to a parameter file 'folder/subfolder/param.txt', then when this variable is defined, MIMB will prefix the value to the relative path and try to find the parameter file. You can also use this special substitution parameter to resolve any files, only if your Informatica server is running on Unix and the parameter files are based on Unix absolute paths. For example, if your parameter files are being referred to as '/opt/params/param.txt', then you can create this directory structure on the windows machine and specify the top directory as the value for the special substitution parameter '$Static_Directory_Prefix@@'.

Groups: One may place group headers in the parameters.prm file in order to specify context for a name=value pair. Examples of groups:
[Global] - applies to all objects in the import.
[folder name.WF:workflow name.ST:session name] - applies to a specified session task in the specified workflow.
[folder name.WF:workflow name.WT:worklet name.ST:session name] - applies to a specified session tasks from a specified worklet in a specified workflow. If session path has more than one worklet, use additional '.WT:worklet' constructs.
[folder name.session name] - applies to all sessions in the specified folder.
[folder name.workflow name] - applies to all workflows in the specified folder.
[session name] - applies to all sessions with specified name.

Examples of global vs. local group context:
- Defines source connection 'src1' name as 'customer_source_DB' for all imported objects:
[Global]
$DBConnection_src1=customer_source_DB

- Defines variable 'MyVar' value for session task 'session1' in the worklet 'task1' of the workflow 'WF1' in the folder 'Folder1':
[Folder1.WF:WF1.WT:task1.ST:session1]
$$MyVar=TEST


The bridge looks for substitution parameter values in the following order:
1.) If a session in Informatica actually has defined a specific substitution parameter pathname, then the bridge first looks for that file and if found looks for that substitution parameter name and value
2.) Otherwise, if a workflow in Informatica actually has defined a specific substitution parameter pathname, then the bridge first looks for that file and if found looks for that substitution parameter name and value
3.) Otherwise, if there is a pathname in the directory specified here that matches the name of the workflow, the bridge then looks for that substitution parameter name and value. It works its way 'up' the directory stucture (to a more general context) until it finds that substitution parameter name and value. One may specify a group in this file to apply name=value pairs to specific sessions.
4.) Otherwise, if there is still not value assigned, the bridge looks in the parameters.prm file in the directory specified here for the substitution parameter name and value.


When the bridge reports that a particular substitution parameter value is not found, this situation may occur when either:
- You have not collected all of the proper parameter files used by Informatica when executing ETL
- There are additional substitution parameter assignments made globally through environment variables to Informatica when executing ETL.

Obviously, if it is the first case, you should obtain the correct set of parameter files and not try to reproduce these assignments by hand. However, in the second case, here is the process one should follow to address this situation:
1.) Add the substitution parameter name=value pair to the parameters.prm file in the directory specified here. This value will then apply globally but be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
2.) If you need to provide different values depending upon the context (say different workflows have different substitution values), add the substitution parameter name=value pair in a group (see above). This value will then apply only to the context defined in the group header, but again will be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
3.) If you need to provide different values depending upon the context (say different workflows have different substitution values), you may also add the substitution parameter name=value pair in a file in the sub-directory structure for that context. Again, this value will then apply only to the context where you placed this name=value pair, however it is given precedence over any name=value pairs defined in a group in the root parameters.prm file. Hence, it will address the missing substitution parameter issue and be the value substituted, unless the session in Informatica was defined with a specific pathname for the substitution parameters.
4.) There should be no need to update the files at pathnames defined within Informatica for specific sessions, as these should be collected properly and made available to the bridge.


Connection types:
You can define target DB types for the a connection (that is not properly defined in Informatica) with a substitution parameter name like:
Connection.[name].DBTYPE

For example if a connection with the name 'ODBC_Connections' is assigned the Oracle database type at runtime, you can use the 'DBTYPE' name=value pair (in this case the bridge will know that 'ODBC_Connection' is of type Oracle and will use the proper handler to parse its metadata):
Connection.ODBC_Connection.DBTYPE=ORACLE

The database type is case insensitive. The list of possible values: Oracle,ODBC,Netezza,Microsoft SQL Server,DB2,Sybase,Informix,SAP BW, SAP R/3,Teradata.

Aliases:
When you know that two connections with different names target the same data in a database, one may use an 'ALIAS' to tell the bridge to treat them as the same data source in the lineage like this:
Connection.ODBC_Connection.ALIAS=oracleDB
Connection.Oracle_Connection.ALIAS=oracleDB
with these definitions, the names of the specified connections will be replaced with 'oracleDB' on runtime and linages will be computed accordingly.

Schemas:
You can override default schema for connection with 'SCHEMA':
Connection.DB_Conn.SCHEMA=dbo
empty schema for DB_Conn will be replaced with 'dbo' in this case.
DIRECTORY      
Miscellaneous Specify miscellaneous options identified with a -letter and value.

For example, -l -c -m 4G -j -Dname=value -Xms1G

-e allow any parameter file extensions. By default .TXT and .PRM file extensions for 'Parameter files directory' are supported.
-i remove illegal xml characters.
-cpd disable Cartesian product lineage for custom transformations with no port dependencies.
-v set environment variable(s) (e.g. -v var1=value -v var2='value with spaces').

-cd: split or merge file system connections by a directory path.

For example, a connection can have two root folders, a and b. When you imported separate File stores for each root folder you would want to split the connection into two connections that can be resolved using these File stores. It can be achieved with option:
-cd a_con=orig_con:/a
requesting to create 'a_con' connection and move the 'a' folder to it from the 'orig_con' connection. The result will have a_con and orig_con connections. The orig_con connection will have the folder branch 'b' that is left over after splinting the folder branch 'a' out.

Here is a little bit more complex example:
-cd a_con=orig_con:/root/a - create 'a_con' connection for the 'root/a' folder branch in the 'a_con' connection.

You can use the option to merge several connections into one. For example, when you have two file stores C:\a and B:\b you can merge them with the option:
-cd C:\=B:\
that will move all folders from B:\ connection to C:\ that will end up with a and b root folders.

-m the maximum Java memory size whole number (e.g. -m 4G or -m 2500M ).
-j the last option that is followed by Java command line options (e.g. -j -Dname=value -Xms1G).
STRING      

 

Bridge Mapping

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (File)"
Metamodel
Informatica PowerCenter (Mapping)
Mapping Comments
     
AggregationTransformation Aggregator  
Description Description  
Name Name  
CallTransformation Reusable Joiner, Reusable Java, Reusable Router, Reusable Sequence, Reusable Update Strategy, Mapplet Call, Reusable External Procedure, Reusable Flexible Target Key Transformation, Reusable Filter, Reusable Stored Procedure, Reusable Sorter, Reusable Custom Transformation, Reusable HTTP, Reusable XML Parser, Reusable Data Masking, Reusable Normalizer, Reusable Lookup, Reusable Rank, Reusable SQL, Reusable Aggregator, Reusable Expression, Reusable Union, Reusable XML Generator, Reusable Transaction Control  
Description Description  
Name Name  
ConnectionDataAttribute Column, Mapping Variable, Mapplet Port  
Description Description  
Name Name  
ConnectionDataSet Table, Mapplet Input, Mapplet Output, Mapping Variables, Mapplet Group  
Description Description  
Name Name  
ConnectionPackage Package  
Description Description  
Name Name  
CustomTransformation XML Generator, Flexible Target Key Transformation, Custom Transformation, HTTP, XML Parser, Data Masking  
Description Description  
Name Name  
DiModel Mapping  
Description Description  
Name Name  
ExpressionTransformation Expression  
Description Description  
Name Name  
FilteringTransformation Filter, Router  
Description Description  
Name Name  
GenericConnectedTransformation Stored Procedure, SQL  
Description Description  
Name Name  
GenericTransformation External Procedure, Java, SQL Lookup, Normalizer, Update Strategy, Rank, Transaction Control  
Description Description  
Name Name  
JoinTransformation Joiner  
Description Description  
Name Name  
SortedInput Sorted Input  
LookupTransformation Lookup  
Description Description  
Name Name  
ReaderTransformation Source Qualifier  
ConnectionName Connection Name  
Description Description  
Name Name  
SequenceGeneratorTransformation Sequence  
Description Description  
Name Name  
SortingTransformation Sorter  
Description Description  
Name Name  
StoreConnection Connection  
Description Description  
Name Name  
SystemType Type  
TransformationDataAttribute Lookup Port, Normalizer Port, Sorter Port, Output Transformation Port, Expression Port, Passthrough Transformation Port, Http Port, Data Masking Port, Rank Port, Transformation Port  
Description Description  
Name Name  
TransformationDataSet Group, Router Group  
Name Name  
UnionTransformation Union  
Description Description  
Name Name  
WriterTransformation Target  
ConnectionName Connection Name  
Description Description  
Name Name  

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (File)"
Metamodel
Informatica PowerCenter (Data Store)
Mapping Comments
     
Attribute Fixed Width Column, Column, Delimited Column  
Description Description  
Name Name  
Position Position, Column Offset  
Class Table  
Description Description  
Name Name  
DatabaseSchema Schema  
Name Name  
FileDirectory Directory  
Name Name  
FlatTextFile Delimited File, Fixed Width File  
Description Description  
Name Name  
StoreModel Data Store  
Name Name  
SystemType Type  

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (File)"
Metamodel
Informatica PowerCenter (Workflow)
Mapping Comments
     
ContainerStep Embedded Worklet  
Name Name  
DiModel Workflow  
Name Name  
EmbeddedCallStep Worklet, Session  
Name Name  
StartStep Start  
Name Name