Informatica PowerCenter (File) - Import - Cloud

Talend Cloud Data Catalog Bridges

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Data Catalog

Bridge Specifications

Vendor Informatica
Tool Name PowerCenter
Tool Version 8.x to 10.x
Tool Web Site http://www.informatica.com/products_services/powercenter/Pages/index.aspx
Supported Methodology [Data Integration] Multi-Model, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via XML File
Data Profiling
Remote Repository Browsing for Model Selection
Incremental Harvesting
Multi-Model Harvesting

SPECIFICATIONS
Tool: Informatica / PowerCenter version 8.x to 10.x via XML File
See http://www.informatica.com/products_services/powercenter/Pages/index.aspx
Metadata: [Data Integration] Multi-Model, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing)
Component: InformaticaPowerCenter version 11.0.0

OVERVIEW
This bridge imports Informatica PowerCenter metadata objects from an XML file exported from Informatica.

REQUIREMENTS
It is recommended that one should import Informatica metadata according to the ETL execution process. In order to do so, please consult with the data warehouse/ETL architect/administrator to find out a list of all workflow and corresponding parameters that make up the whole ETL process.

Included in this analysis may be:

Multiple source systems - a single-source execution plan extracts data from a single instance of a single source system. A multi-source execution plan extracts data from multiple instances of the same source system, (Homogeneous) or multiple instances of dissimilar source systems (Heterogeneous). For example, a business might have an instance of ERP in one location and time zone and another instance of the same ERP in another location and time zone (Homogeneous). Or a business might have an instance of CRM in one location, an instance of ERP in another location, and a second instance of the ERP in yet a third location (Heterogeneous).

- multiple phases - extraction, loading, post-proccesing

- multiple increments - full, incremental and micro loads

FREQUENTLY ASKED QUESTIONS
n/a

LIMITATIONS
Refer to the current general known limitations at http://metaintegration.com/Products/MIMB/MIMBKnownLimitations.html or bundled in Documentation/ReadMe/MIMBKnownLimitations.html

Note about completeness of the Metadata Import for Lineage Presenation.

When one asks for a lineage of a data warehouse column, one expects to get an answer reflecting the run-time reality of the whole ETL process (execution plan). The process can involve multiple source systems, multiple phases, and multiple increments.

In the case of Informatica, an ETL process executes workflows. A workflow can be executed many times with different parameters, but not every type of parameters affect lineage. Connectivity or SQL overwrite parameters that change sources or targets that affect lineage. Date parameters that control incremental loading do not affect lineage.

The Informatica repository stores workflows in folders. The Informatica folder structure can be aligned with execution process structure, but does not have to.

SUPPORT
Provide a trouble shooting package with:
- the debug log (can be set in the UI or in conf/conf.properties with MIR_LOG_LEVEL=6)
- the metadata backup if available (can be set in the Miscellaneous parameter with option -backup)

Q: How do I provide metadata to the support team to reproduce an issue?
A: Create a backup of the Informatica PowerCenter metadata you are trying to import by using the pmrep command on the command line on a machine that has a full client installation of Informatica PowerCenter and is able to connect to the repository.

- Open a command line window on the app server and go to the directory where pmrep.exe is (same as for the 'Path to the Informatica binary files' parameter for the bridge)

Enter the following commands:
- "pmrep.exe"
- "connect -r Repo_name -n user_name -x password -h host -o port"
where these variables match the entries in the bridge for the PowerCenter Repository connection
- "backup -o c:\temp\InfaBackup.dat"

The backup file is written to the Informatica PowerCenter Server and may be retrieved there.


Bridge Parameters

Parameter Name Description Type Values Default Scope
File The bridge uses an XML file generated using Informatica PowerCenter object export.

For example, to export one or more objects using PowerCenter Designer or Repository Manager into an XML file:
1. Start the PowerCenter tool.
2. Browse the repository and select the objects to be exported
3. Choose 'Export Objects...' from the 'Repository' menu.
4. Export the selected objects into an XML file.

For more details about exporting and importing objects, see 'Exporting and Importing Objects' in the Informatica Repository Guide.

This bridge will use the generated XML file as input.
FILE *.xml   Mandatory
Parameter files directory Directory containing Informatica PowerCenter parameters file(s). For detailed information about parameters files, please see the product documentation.

Informatica uses substitution parameters that are substituted with values at ETL execution (runtime). Just like Informatica itself, the bridge looks for these values in parameter files available to the bridge. This bridge parameter should be given a directory name, which is the root directory for all of these parameters files.

If the bridge cannot find a name=value pair for a given substitution parameter, the bridge will generally fail to parse the metadata correctly and report warnings and errors accordingly. It will also report the name of the undefined substitution parameter.
There are several ways in which to define substitution parameters and place the parameter files in the directory referred to here:
- If you are importing one workflow that uses a parameter file please name it 'parameters.prm' and place the file in the directory.

- If you are importing multiple workflows that re-use the same parameter file please name it 'parameters.prm' and place it in the directory.

- If you are importing multiple workflows that use different parameter files please place these files under the directory in sub-directories. Each parameter file has a name of the workflow that uses it (with extension '.prm') and is placed in a sub-directory that has a name of the workflow's repository folder.

Path Prefix:
You can use a special substitution parameter (look at the variable definition section below) '$Static_Directory_Prefix@@' to prefix any relative paths for your parameter files. For example, if your session refers to a parameter file 'folder/subfolder/param.txt', then when this variable is defined, MIMB will prefix the value to the relative path and try to find the parameter file. You can also use this special substitution parameter to resolve any files, only if your Informatica server is running on Unix and the parameter files are based on Unix absolute paths. For example, if your parameter files are being referred to as '/opt/params/param.txt', then you can create this directory structure on the windows machine and specify the top directory as the value for the special substitution parameter '$Static_Directory_Prefix@@'.

Groups: One may place group headers in the parameters.prm file in order to specify context for a name=value pair. Examples of groups:
[Global] - applies to all objects in the import.
[folder name.WF:workflow name.ST:session name] - applies to a specified session task in the specified workflow.
[folder name.WF:workflow name.WT:worklet name.ST:session name] - applies to a specified session tasks from a specified worklet in a specified workflow. If session path has more than one worklet, use additional '.WT:worklet' constructs.
[folder name.session name] - applies to all sessions in the specified folder.
[folder name.workflow name] - applies to all workflows in the specified folder.
[session name] - applies to all sessions with specified name.

Examples of global vs. local group context:
- Defines source connection 'src1' name as 'customer_source_DB' for all imported objects:
[Global]
$DBConnection_src1=customer_source_DB

- Defines variable 'MyVar' value for session task 'session1' in the worklet 'task1' of the workflow 'WF1' in the folder 'Folder1':
[Folder1.WF:WF1.WT:task1.ST:session1]
$$MyVar=TEST

The bridge looks for substitution parameter values in the following order:
1.) If a session in Informatica actually has defined a specific substitution parameter pathname, then the bridge first looks for that file and if found looks for that substitution parameter name and value
2.) Otherwise, if a workflow in Informatica actually has defined a specific substitution parameter pathname, then the bridge first looks for that file and if found looks for that substitution parameter name and value
3.) Otherwise, if there is a pathname in the directory specified here that matches the name of the workflow, the bridge then looks for that substitution parameter name and value. It works its way 'up' the directory stucture (to a more general context) until it finds that substitution parameter name and value. One may specify a group in this file to apply name=value pairs to specific sessions.
4.) Otherwise, if there is still not value assigned, the bridge looks in the parameters.prm file in the directory specified here for the substitution parameter name and value.

When the bridge reports that a particular substitution parameter value is not found, this situation may occur when either:
- You have not collected all of the proper parameter files used by Informatica when executing ETL
- There are additional substitution parameter assignments made globally through environment variables to Informatica when executing ETL.

Obviously, if it is the first case, you should obtain the correct set of parameter files and not try to reproduce these assignments by hand. However, in the second case, here is the process one should follow to address this situation:
1.) Add the substitution parameter name=value pair to the parameters.prm file in the directory specified here. This value will then apply globally but be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
2.) If you need to provide different values depending upon the context (say different workflows have different substitution values), add the substitution parameter name=value pair in a group (see above). This value will then apply only to the context defined in the group header, but again will be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
3.) If you need to provide different values depending upon the context (say different workflows have different substitution values), you may also add the substitution parameter name=value pair in a file in the sub-directory structure for that context. Again, this value will then apply only to the context where you placed this name=value pair, however it is given precedence over any name=value pairs defined in a group in the root parameters.prm file. Hence, it will address the missing substitution parameter issue and be the value substituted, unless the session in Informatica was defined with a specific pathname for the substitution parameters.
4.) There should be no need to update the files at pathnames defined within Informatica for specific sessions, as these should be collected properly and made available to the bridge.

Connection types:
You can define target DB types for the a connection (that is not properly defined in Informatica) with a substitution parameter name like:
Connection.[name].DBTYPE

For example if a connection with the name 'ODBC_Connections' is assigned the Oracle database type at runtime, you can use the 'DBTYPE' name=value pair (in this case the bridge will know that 'ODBC_Connection' is of type Oracle and will use the proper handler to parse its metadata):
Connection.ODBC_Connection.DBTYPE=ORACLE

The database type is case insensitive. The list of possible values: Oracle,ODBC,Netezza,Microsoft SQL Server,DB2,Sybase,Informix,SAP BW, SAP R/3,Teradata.

Aliases:
When you know that two connections with different names target the same data in a database, one may use an 'ALIAS' to tell the bridge to treat them as the same data source in the lineage like this:
Connection.ODBC_Connection.ALIAS=oracleDB
Connection.Oracle_Connection.ALIAS=oracleDB
with these definitions, the names of the specified connections will be replaced with 'oracleDB' on runtime and linages will be computed accordingly.

Schemas:
You can override default schema for connection with 'SCHEMA':
Connection.DB_Conn.SCHEMA=dbo
empty schema for DB_Conn will be replaced with 'dbo' in this case.
DIRECTORY      
Miscellaneous INTRODUCTION
Specify miscellaneous options starting with a dash and optionally followed by parameters, e.g.
-connection.cast MyDatabase1="SQL Server"
Some options can be used multiple times if applicable, e.g.
-connection.rename NewConnection1=OldConnection1 -connection.rename NewConnection2=OldConnection2;
As the list of options can become a long string, it is possible to load it from a file which must be located in ${MODEL_BRIDGE_HOME}\data\MIMB\parameters and have the extension .txt. In such case, all options must be defined within that file as the only value of this parameter, e.g.
ETL/Miscellaneous.txt

JAVA ENVIRONMENT OPTIONS
-m <Java Memory's maximum size>
1G by default on 64bits JRE or as set in conf/conf.properties, e.g.
-m 8G
-m 8000M

-j <Java Runtime Environment command line options>
This option must be the last one in the Miscellaneous parameter as all the text after -j is passed "as is" to the JRE, e.g.
-j -Dname=value -Xms1G
The following option must be set when a proxy is used to access internet (this is critical to access https://repo.maven.apache.org/maven2/ (and exceptionally a few other tool sites) in order to download the necessary third party software libraries.
-j -Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=3128 -Dhttps.proxyHost=127.0.0.1 -Dhttps.proxyPort=3128 -Dhttp.proxyUser=user -Dhttp.proxyPassword=pass -Dhttps.proxyUser=user -Dhttps.proxyPassword=pass

-jre <Java Runtime Environment full path name>
It can be an absolute path to javaw.exe on Windows or a link/script path on Linux, e.g.
-jre "c:\Program Files\Java\jre1.8.0_211\bin\javaw.exe"

-v <Environment variable value>
None by default, e.g.
-v var1=value1 -v var2="value2 with spaces"

MODEL IMPORT OPTIONS
-model.name <model name>
Override the model name, e.g.
-model.name "My Model Name"

-prescript <script name>
The script must be located in the bin directory, and have .bat or .sh extension.
The script path must not include any parent directory symbol (..).
The script should return exit code 0 to indicate success, or another value to indicate failure.
For example:
-prescript "script.bat arg1 arg2"

-cache.clear
Clears the cache before the import, and therefore will run a full import without incremental harvesting.
Warning: this is a system option managed by the application calling the bridge and should not be set by users.

-backup <directory>
Full path of an empty directory to save the metadata input files for further troubleshooting.

DATA CONNECTION OPTIONS
Data Connections are produced by the import bridges typically from ETL/DI and BI tools to refer to the source and target data stores they use. These data connections are then used by metadata management tools to connect them (metadata stitching) to their actual data stores (e.g. databases, file system, etc.) in order to produce the full end to end data flow lineage and impact analysis. The name of each data connection is unique by import model. The data connection names used within DI/BI design tools are used when possible, otherwise connection names are generated to be short but meaningful such as the database / schema name, the file system path, or Uniform Resource Identifier (URI). The following options allows to manipulate connections. These options replaces the legacy options -c, -cd, and -cs.

-connection.cast ConnectionName=ConnectionType
Casts a generic database connection (e.g. ODBC/JDBC) to a precise database type (e.g. ORACLE) for SQL Parsing, e.g.
-connection.cast "My Database"="SQL SERVER".
The list of supported data store connection types includes:
ACCESS
CASSANDRA
DB2
DENODO
HIVE
MYSQL
NETEZZA
ORACLE
POSTGRESQL
PRESTO
REDSHIFT
SALESFORCE
SAP HANA
SNOWFLAKE
SQL SERVER
SYBASE
TERADATA
VECTORWISE
VERTICA

-connection.rename OldConnection=NewConnection
Renames an existing connection to a new name, e.g.
-connection.rename OldConnectionName=NewConnectionName
Multiple existing database connections can be renamed and merged into one new database connection, e.g.
-connection.rename MySchema1=MyDatabase -connection.rename MySchema2=MyDatabase

-connection.split oldConnection.Schema1=newConnection
Splits a database connection into one or multiple database connections.
A single database connection can be split into one connection per schema, e.g.
-connection.split MyDatabase
All database connections can be split into one connection per schema, e.g.
-connection.split *
A database connection can be explicitly split creating a new database connection by appending a schema name to a database, e.g.
-connection.split MyDatabase.schema1=MySchema1

-connection.map DestinationPath=SourcePath
Maps a source path to destination path. This is useful for file system connections when different paths points to the same object (directory or file).
On Hadoop, a process can write into a CSV file specified with the HDFS full path, but another process reads from a HIVE table implemented (external) by the same file specified using a relative path with default file name and extension, e.g.
-connection.map hdfs://host:8020/users/user1/folder/file.csv=/user1/folder
On Linux, a given directory (or file) like /data can be referred to by multiple symbolic links like /users/john and /users/paul, e.g.
-connection.map /users/John=/data -connection.map /users/paul=/data
On Windows, a given directory like C:\data can be referred to by multiple network drives like M: and N:, e.g.
-connection.map M:\=C:\data -connection.map N:\=C:\data

-pppd
Create the connections and connection data sets in DI/ETL design models. This feature should only be used when intending to export to another DI/ETL tool.

INFORMATICA POWERCENTER OPTIONS
-e
Allow any parameter file extensions. By default .TXT and .PRM file extensions for 'Parameter files directory' are supported.

-i
Remove illegal XML characters.

-cpd
Disable Cartesian product lineage for custom transformations with no port dependencies.

-cd: split or merge file system connections by a directory path.
Useful to split a connection into two or more root folders, that can be stitched to two or more file stores. To split the connection 'orig_con' into two connections: 'a_con' based upon the 'a' folder and the rest remains in the orig_con connection:
-cd a_con=orig_con:/a

To split the connection 'orig_con' into two connections: 'a_con" based upon the 'root/a' folder:
-cd a_con=orig_con:/root/a -

Useful to merge several connections into one. To merge two connections 'C:' with a root folder 'R1' and 'B:' with a root folder 'R2' into one connection 'C:' with two root folders - 'R1' and 'R2':
-cd C:\=B:\
STRING      

 

Bridge Mapping

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (File)"
Metamodel
Informatica PowerCenter (Mapping)
Mapping Comments
     
AggregationTransformation Aggregator  
Description Description  
Name Name  
CallTransformation Reusable Sequence, Mapplet Call, Reusable External Procedure, Reusable SAP BAPI Transformation, Reusable Sorter, Reusable Custom Transformation, Reusable Data Masking, Reusable Normalizer, Reusable Rank, Reusable SQL, Reusable Aggregator, Reusable Union, Reusable XML Generator, Reusable Transaction Control, Reusable Joiner, Reusable Java, Reusable Router, Reusable Update Strategy, Reusable Flexible Target Key Transformation, Reusable Filter, Reusable Stored Procedure, Reusable HTTP, Reusable XML Parser, Reusable WebServices Consumer, Reusable Lookup, Reusable Expression  
Description Description  
Name Name  
ConnectionDataAttribute Column, Mapping Variable, Mapplet Port  
Description Description  
Name Name  
ConnectionDataSet Table, Mapplet Input, Mapplet Output, Mapping Variables, File, Mapplet Group  
Description Description  
Name Name  
ConnectionNamespace Directory  
Name Name  
ConnectionPackage Package  
Description Description  
Name Name  
CustomTransformation Java, XML Generator, Flexible Target Key Transformation, Custom Transformation, HTTP, SAP BAPI Transformation, XML Parser, Data Masking  
Description Description  
Name Name  
DiModel Mapping  
Description Description  
Name Name  
ExpressionTransformation Expression  
Description Description  
Name Name  
FilteringTransformation Filter, Router  
Description Description  
Name Name  
GenericConnectedTransformation Stored Procedure, WebServices Consumer, SQL  
ConnectionName Connection Name  
Description Description  
Name Name  
GenericTransformation External Procedure, SQL Lookup, Normalizer, Update Strategy, Rank, Transaction Control  
Description Description  
Name Name  
JoinTransformation Joiner  
Description Description  
Name Name  
SortedInput Sorted Input  
LookupTransformation Lookup  
ConnectionName Connection Name  
Description Description  
Name Name  
ReaderTransformation Source Qualifier  
ConnectionName Connection Name  
Description Description  
Name Name  
SequenceGeneratorTransformation Sequence  
Description Description  
Name Name  
SortingTransformation Sorter  
Description Description  
Name Name  
StoreConnection Connection  
Description Description  
Name Name  
SystemType Type  
TransformationDataAttribute Lookup Port, Normalizer Port, Sorter Port, Output Transformation Port, Expression Port, Passthrough Transformation Port, Http Port, Data Masking Port, Rank Port, Transformation Port  
Description Description  
Name Name  
TransformationDataSet Group, Router Group  
Name Name  
UnionTransformation Union  
Description Description  
Name Name  
WriterTransformation Target  
ConnectionName Connection Name  
Description Description  
Name Name  

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (File)"
Metamodel
Informatica PowerCenter (Data Store)
Mapping Comments
     
Attribute Fixed Width Column, Column, Delimited Column  
Description Description  
Name Name  
Position Position, Column Offset  
Class Table  
Description Description  
Name Name  
DataAttribute Data Attribute  
Name Name  
Position Position  
DatabaseSchema Schema  
Name Name  
FileDirectory Directory  
Name Name  
FlatTextFile Delimited File, Fixed Width File  
Description Description  
Name Name  
StoreModel Data Store  
Name Name  
SystemType Type  
StoredProcedure Stored Procedure  
Name Name  

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (File)"
Metamodel
Informatica PowerCenter (Workflow)
Mapping Comments
     
ContainerStep Embedded Worklet  
Name Name  
DiModel Workflow  
Name Name  
EmbeddedCallStep Worklet, Session  
Name Name  
StartStep Start  
Name Name