Informatica PowerCenter (File) - Import - Cloud

Talend Cloud Data Catalog Bridges

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Catalog
Last publication date
2024-04-07

Note: AWS

Bridge Requirements

This bridge:
  • requires Internet access to https://repo.maven.apache.org/maven2/ and/or other tool sites to download drivers into <TDC_HOME>/data/download/MIMB/.

Bridge Specifications

Vendor Informatica
Tool Name PowerCenter
Tool Version 8.x to 10.x
Tool Web Site https://docs.informatica.com/data-integration/powercenter/
Supported Methodology [Data Integration] Multi-Model, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via XML File
Data Profiling
Incremental Harvesting
Multi-Model Harvesting
Remote Repository Browsing for Model Selection

SPECIFICATIONS
Tool: Informatica / PowerCenter version 8.x to 10.x via XML File
See https://docs.informatica.com/data-integration/powercenter/
Metadata: [Data Integration] Multi-Model, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing)
Component: InformaticaPowerCenter version 11.2.0

DISCLAIMER
This import bridge requires internet access to download third-party libraries:
- such as https://repo.maven.apache.org/maven2/ to download open source third-party libraries,
- and more sites for other third-party software such as database specific JDBC drivers.

The downloaded third-party libraries are stored into $HOME/data/download/MIMB/
- If HTTPS fails, the import bridge then tries with HTTP.
- If a proxy is used to access internet, you must configure that proxy in the JRE (see the -j option in the Miscellaneous parameter).
- If the import bridge does not have full access to internet, that $HOME/data/download/MIMB/ directory can be copied from another server with internet access where the command $HOME/bin/MIMB.sh (or .bat) -d can be used to download all third-party libraries used by all bridges at once.

By running this import bridge, you hereby acknowledge responsibility for the license terms and any potential security vulnerabilities from these downloaded third-party software libraries.

OVERVIEW
This import bridge imports Informatica PowerCenter metadata objects from an XML file exported from Informatica.

REQUIREMENTS
It is recommended that one should import Informatica metadata according to the ETL execution process. In order to do so, please consult with the data warehouse/ETL architect/administrator to find out a list of all workflow and corresponding parameters that make up the whole ETL process.

Included in this analysis may be:

Multiple source systems - a single-source execution plan extracts data from a single instance of a single source system. A multi-source execution plan extracts data from multiple instances of the same source system, (Homogeneous) or multiple instances of dissimilar source systems (Heterogeneous). For example, a business might have an instance of ERP in one location and time zone and another instance of the same ERP in another location and time zone (Homogeneous). Or a business might have an instance of CRM in one location, an instance of ERP in another location, and a second instance of the ERP in yet a third location (Heterogeneous).

- multiple phases - extraction, loading, post-processing

- multiple increments - full, incremental and micro loads

FREQUENTLY ASKED QUESTIONS
n/a

LIMITATIONS
Refer to the current general known limitations at http://metaintegration.com/Products/MIMB/MIMBKnownLimitations.html or bundled in Documentation/ReadMe/MIMBKnownLimitations.html

Note about completeness of the Metadata Import for Lineage Presentation.

When one asks for a lineage of a data warehouse column, one expects to get an answer reflecting the run-time reality of the whole ETL process (execution plan). The process can involve multiple source systems, multiple phases, and multiple increments.

In the case of Informatica, an ETL process executes workflows. A workflow can be executed many times with different parameters, but not every type of parameters affect lineage. Connectivity or SQL overwrite parameters that change sources or targets that affect lineage. Date parameters that control incremental loading do not affect lineage.

The Informatica repository stores workflows in folders. The Informatica folder structure can be aligned with execution process structure, but does not have to.

SUPPORT
Provide a troubleshooting package with:
- the debug log (can be set in the UI or in conf/conf.properties with MIR_LOG_LEVEL=6)
- the metadata backup if available (can be set in the Miscellaneous parameter with -backup option, although this common option is not implemented on all bridges for technical reasons).

Q: How do I provide metadata to the support team to reproduce an issue?
A: Create a backup of the Informatica PowerCenter metadata you are trying to import by using the pmrep command on the command line on a machine that has a full client installation of Informatica PowerCenter and is able to connect to the repository.

- Open a command line window on the app server and go to the directory where pmrep.exe is (same as for the 'Path to the Informatica binary files' parameter for the import bridge)

Enter the following commands:
- "pmrep.exe"
- "connect -r Repo_name -n user_name -x password -h host -o port"
where these variables match the entries in the import bridge for the PowerCenter Repository connection
- "backup -o c:\temp\InfaBackup.dat"

The backup file is written to the Informatica PowerCenter Server and may be retrieved there.


Bridge Parameters

Parameter Name Description Type Values Default Scope
File The import bridge uses an XML file generated using Informatica PowerCenter object export.

For example, to export one or more objects using PowerCenter Designer or Repository Manager into an XML file:
1. Start the PowerCenter tool.
2. Browse the repository and select the objects to be exported
3. Choose 'Export Objects...' from the 'Repository' menu.
4. Export the selected objects into an XML file.

For more details about exporting and importing objects, see 'Exporting and Importing Objects' in the Informatica Repository Guide.

This import bridge will use the generated XML file as input.
FILE *.xml   Mandatory
Parameter files directory Directory containing Informatica PowerCenter parameters file(s). For detailed information about parameters files, please see the product documentation.

Informatica uses substitution parameters that are substituted with values at ETL execution (runtime). Just like Informatica itself, the import bridge looks for these values in parameter files available to the import bridge. This import bridge parameter should be given a directory name, which is the root directory for all of these parameters files.

If the import bridge cannot find a name=value pair for a given substitution parameter, the import bridge will generally fail to parse the metadata correctly and report warnings and errors accordingly. It will also report the name of the undefined substitution parameter.
There are several ways in which to define substitution parameters and place the parameter files in the directory referred to here:
- If you are importing one workflow that uses a parameter file please name it 'parameters.prm' and place the file in the directory.

- If you are importing multiple workflows that re-use the same parameter file please name it 'parameters.prm' and place it in the directory.

- If you are importing multiple workflows that use different parameter files please place these files under the directory in sub-directories. Each parameter file has a name of the workflow that uses it (with extension '.prm') and is placed in a sub-directory that has a name of the workflow's repository folder.

Path Prefix:
You can use a special substitution parameter (look at the variable definition section below) '$Static_Directory_Prefix@@' to prefix any relative paths for your parameter files. For example, if your session refers to a parameter file 'folder/subfolder/param.txt', then when this variable is defined, MIMB will prefix the value to the relative path and try to find the parameter file. You can also use this special substitution parameter to resolve any files, only if your Informatica server is running on Unix and the parameter files are based on Unix absolute paths. For example, if your parameter files are being referred to as '/opt/params/param.txt', then you can create this directory structure on the Windows machine and specify the top directory as the value for the special substitution parameter '$Static_Directory_Prefix@@'.

Groups: One may place group headers in the parameters.prm file in order to specify context for a name=value pair. Examples of groups:
[Global] - applies to all objects in the import.
[folder name.WF:workflow name.ST:session name] - applies to a specified session task in the specified workflow.
[folder name.WF:workflow name.WT:worklet name.ST:session name] - applies to a specified session tasks from a specified worklet in a specified workflow. If session path has more than one worklet, use additional '.WT:worklet' constructs.
[folder name.session name] - applies to all sessions in the specified folder.
[folder name.workflow name] - applies to all workflows in the specified folder.
[session name] - applies to all sessions with specified name.

Examples of global vs. local group context:
- Defines source connection 'src1' name as 'customer_source_DB' for all imported objects:
[Global]
$DBConnection_src1=customer_source_DB

- Defines variable 'MyVar' value for session task 'session1' in the worklet 'task1' of the workflow 'WF1' in the folder 'Folder1':
[Folder1.WF:WF1.WT:task1.ST:session1]
$$MyVar=TEST

The import bridge looks for substitution parameter values in the following order:
1.) If a session in Informatica actually has defined a specific substitution parameter pathname, then the import bridge first looks for that file and if found looks for that substitution parameter name and value
2.) Otherwise, if a workflow in Informatica actually has defined a specific substitution parameter pathname, then the import bridge first looks for that file and if found looks for that substitution parameter name and value
3.) Otherwise, if there is a pathname in the directory specified here that matches the name of the workflow, the import bridge then looks for that substitution parameter name and value. It works its way 'up' the directory structure (to a more general context) until it finds that substitution parameter name and value. One may specify a group in this file to apply name=value pairs to specific sessions.
4.) Otherwise, if there is still not value assigned, the import bridge looks in the parameters.prm file in the directory specified here for the substitution parameter name and value.

When the import bridge reports that a particular substitution parameter value is not found, this situation may occur when either:
- You have not collected all of the proper parameter files used by Informatica when executing ETL
- There are additional substitution parameter assignments made globally through environment variables to Informatica when executing ETL.

Obviously, if it is the first case, you should obtain the correct set of parameter files and not try to reproduce these assignments by hand. However, in the second case, here is the process one should follow to address this situation:
1.) Add the substitution parameter name=value pair to the parameters.prm file in the directory specified here. This value will then apply globally but be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
2.) If you need to provide different values depending upon the context (say different workflows have different substitution values), add the substitution parameter name=value pair in a group (see above). This value will then apply only to the context defined in the group header, but again will be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
3.) If you need to provide different values depending upon the context (say different workflows have different substitution values), you may also add the substitution parameter name=value pair in a file in the sub-directory structure for that context. Again, this value will then apply only to the context where you placed this name=value pair, however it is given precedence over any name=value pairs defined in a group in the root parameters.prm file. Hence, it will address the missing substitution parameter issue and be the value substituted, unless the session in Informatica was defined with a specific pathname for the substitution parameters.
4.) There should be no need to update the files at pathnames defined within Informatica for specific sessions, as these should be collected properly and made available to the import bridge.

Connection types:
You can define target DB types for the a connection (that is not properly defined in Informatica) with a substitution parameter name like:
Connection.[name].DBTYPE

For example if a connection with the name 'ODBC_Connections' is assigned the Oracle database type at runtime, you can use the 'DBTYPE' name=value pair (in this case the import bridge will know that 'ODBC_Connection' is of type Oracle and will use the proper handler to parse its metadata):
Connection.ODBC_Connection.DBTYPE=ORACLE

The database type is case insensitive. The list of possible values:
- Access
- Cassandra
- DB2
- Denodo
- Hive
- Informix
- Microsoft SQL Server
- MySQL
- Netezza
- ODBC
- PostgreSQL
- Presto
- Redshift
- Salesforce
- SAP BW
- SAP HANA
- SAP R/3
- Snowflake
- Sybase
- Teradata
- Vectorwise
- Vertica
- Oracle

Aliases:
When you know that two connections with different names target the same data in a database, one may use an 'ALIAS' to tell the import bridge to treat them as the same data source in the lineage like this:
Connection.ODBC_Connection.ALIAS=oracleDB
Connection.Oracle_Connection.ALIAS=oracleDB
with these definitions, the names of the specified connections will be replaced with 'oracleDB' on runtime and lineages will be computed accordingly.

Schemas:
You can override default schema for connection with 'SCHEMA':
Connection.DB_Conn.SCHEMA=dbo
empty schema for DB_Conn will be replaced with 'dbo' in this case.
DIRECTORY      
Miscellaneous INTRODUCTION
Specify miscellaneous options starting with a dash and optionally followed by parameters, e.g.
-connection.cast MyDatabase1="MICROSOFT SQL SERVER"
Some options can be used multiple times if applicable, e.g.
-connection.rename NewConnection1=OldConnection1 -connection.rename NewConnection2=OldConnection2;
As the list of options can become a long string, it is possible to load it from a file which must be located in ${MODEL_BRIDGE_HOME}\data\MIMB\parameters and have the extension .txt. In such case, all options must be defined within that file as the only value of this parameter, e.g.
ETL/Miscellaneous.txt

JAVA ENVIRONMENT OPTIONS
-java.memory <Java Memory's maximum size> (previously -m)

1G by default on 64bits JRE or as set in conf/conf.properties, e.g.
-java.memory 8G
-java.memory 8000M

-java.parameters <Java Runtime Environment command line options> (previously -j)

This option must be the last one in the Miscellaneous parameter as all the text after -java.parameters is passed "as is" to the JRE, e.g.
-java.parameters -Dname=value -Xms1G
The following option must be set when a proxy is used to access internet (this is critical to access https://repo.maven.apache.org/maven2/ and exceptionally a few other tool sites) in order to download the necessary third-party software libraries.
Note: The majority of proxies are concerned with encrypting (HTTPS) the outside (of the company) traffic and trust the inside traffic that can access proxy over HTTP. In this case, an HTTPS request reaches the proxy over HTTP where the proxy HTTPS-encrypts it.
-java.parameters -java.parameters -Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=3128 -Dhttp.proxyUser=user -Dhttp.proxyPassword=pass

MODEL IMPORT OPTIONS
-model.name <model name>

Override the model name, e.g.
-model.name "My Model Name"

-prescript <script name>

This option allows running a script before the bridge execution.
The script must be located in the bin directory (or as specified with M_SCRIPT_PATH in conf/conf.properties), and have .bat or .sh extension.
The script path must not include any parent directory symbol (..).
The script should return exit code 0 to indicate success, or another value to indicate failure.
For example:
-prescript "script.bat arg1 arg2"

-postscript <script name>

This option allows running a script after successful execution of the bridge.
The script must be located in the bin directory (or as specified with M_SCRIPT_PATH in conf/conf.properties), and have .bat or .sh extension.
The script path must not include any parent directory symbol (..).
The script should return exit code 0 to indicate success, or another value to indicate failure.
For example:
-postscript "script.bat arg1 arg2"

-cache.clear

Clears the cache before the import, and therefore will run a full import without incremental harvesting.

If the model was not changed and the -cache.clear parameter is not used (incremental harvesting), then a new version will not be created.
If the model was not changed and the -cache.clear parameter is set (full source import instead of incremental), then a new version will be created.

-backup <directory>

This option allows to save the bridge input metadata for further troubleshooting. The provided <directory> must be empty.

The primary use of this option is for data store import bridges, in particular JDBC based database import bridges.

Note that this option is not operational on some bridges including:
- File based import bridges (as such input files can be used instead)
- DI/BI repository import bridges (as the tool's repository native backup can be used instead)
- Some API based import bridges (e.g. COM based) for technical reasons.

DATA CONNECTION OPTIONS
Data Connections are produced by the import bridges typically from ETL/DI and BI tools to refer to the source and target data stores they use. These data connections are then used by metadata management tools to connect them (metadata stitching) to their actual data stores (e.g. databases, file system, etc.) in order to produce the full end to end data flow lineage and impact analysis. The name of each data connection is unique by import model. The data connection names used within DI/BI design tools are used when possible, otherwise connection names are generated to be short but meaningful such as the database / schema name, the file system path, or Uniform Resource Identifier (URI). The following option allows to manipulate connections. These options replaces the legacy options -c, -cd, and -cs.

-connection.cast ConnectionName=ConnectionType

Casts a generic database connection (e.g. ODBC/JDBC) to a precise database type (e.g. ORACLE) for SQL Parsing, e.g.
-connection.cast "My Database"="MICROSOFT SQL SERVER".
The list of supported data store connection types includes:
ACCESS
APACHE CASSANDRA
DB2/UDB
DENODO
GOOGLE BIGQUERY
HIVE
MYSQL
NETEZZA
ORACLE
POSTGRESQL
PRESTO
REDSHIFT
SALESFORCE
SAP HANA
SNOWFLAKE
MICROSOFT SQL AZURE
MICROSOFT SQL SERVER
SYBASE SQL SERVER
SYBASE AS ENTERPRISE
TERADATA
VECTORWISE
HP VERTICA

-connection.rename OldConnection=NewConnection

Renames an existing connection to a new name, e.g.
-connection.rename OldConnectionName=NewConnectionName
Multiple existing database connections can be renamed and merged into one new database connection, e.g.
-connection.rename MySchema1=MyDatabase -connection.rename MySchema2=MyDatabase

-connection.split oldConnection.Schema1=newConnection

Splits a database connection into one or multiple database connections.
A single database connection can be split into one connection per schema, e.g.
-connection.split MyDatabase
All database connections can be split into one connection per schema, e.g.
-connection.split *
A database connection can be explicitly split creating a new database connection by appending a schema name to a database, e.g.
-connection.split MyDatabase.schema1=MySchema1

-connection.map SourcePath=DestinationPath

Maps a source path to destination path. This is useful for file system connections when different paths points to the same object (directory or file).
On Hadoop, a process can write into a CSV file specified with the HDFS full path, but another process reads from a Hive table implemented (external) by the same file specified using a relative path with default file name and extension, e.g.
-connection.map /user1/folder=hdfs://host:8020/users/user1/folder/file.csv
On Linux, a given directory (or file) like /data can be referred to by multiple symbolic links like /users/john and /users/paul, e.g.
-connection.map /data=/users/John -connection.map /data=/users/paul
On Windows, a given directory like C:\data can be referred to by multiple network drives like M: and N:, e.g.
-connection.map C:\data=M:\ -connection.map C:\data=N:\

-connection.casesensitive ConnectionName

Overrides the default case insensitive matching rules for the object identifiers inside the specified connection, provided the detected type of the data store by itself supports this configuration (e.g. Microsoft SQL Server, MySql etc.), e.g.
-connection.casesensitive "My Database"

-connection.level AggregationLevel

Specifies the aggregation level for the external connections, e.g.-connection.level catalog
The list of the supported values:
server
catalog
schema (default)

ETL OPTIONS
These options are available on all ETL/DI tools and scripting import bridges in order to:
- Process all SQL parsing
- Build runtime models from design models and runtime variables
- Calculate runtime summary lineage

-etl.summary.add.ultimate (previously -ppus)

Adds the ultimate runtime lineage summary (may generate a very large model).

-etl.summary.only.ultimate (previously -ppsp)

Only produces the ultimate runtime lineage summary (may generate a very large model).

-etl.connections.add.internal.design (previously -pppd)

Adds the internal ETL design data connectors (needed for ETL conversion).

-etl.connections.remove.input (previously -ri)

Removes all input (source) data connections (to reduce the size of the generated model).

-etl.transformations.enable.xml

This option enables the parsing and support of XML transformations in DI/ETL import bridges such as Informatica PowerCenter, Informatica Developer, Microsoft SSIS, IBM DataStage, and Talend Data Integration.
In case of Informatica, such XML transformations corresponds to the following PowerCenter features
https://docs.informatica.com/data-integration/powercenter/10-5/xml-guide/midstream-xml-transformations.html
https://docs.informatica.com/data-integration/powercenter/10-5/xml-guide/xml-source-qualifier-transformation.html
Note that this option may significantly slow down the import.

INFORMATICA POWERCENTER OPTIONS
-pc.parameters.enable.extensions (previously -e)

Allow any parameter file extensions. By default .TXT and .PRM file extensions for 'Parameter files directory' are supported.

-pc.xml.ignore.illegalcharacters (previously -i)

Remove illegal XML characters.

-pc.transformations.disable.cartesianproduct (previously -cpd)

Disable Cartesian product lineage for SAP BAPI Transformation.
STRING      

 

Bridge Mapping

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (File)"
Metamodel
Informatica PowerCenter (Mapping)
Mapping Comments
     
AggregationTransformation Aggregator  
Description Description  
Name Name  
CallTransformation Reusable Sequence, Mapplet Call, Reusable External Procedure, Reusable SAP BAPI Transformation, Reusable Sorter, Reusable Custom Transformation, Reusable Data Masking, Reusable Normalizer, Reusable Rank, Reusable SQL, Reusable Aggregator, Reusable Union, Reusable XML Generator, Reusable Transaction Control, Reusable Joiner, Reusable Java, Reusable Router, Reusable Update Strategy, Reusable Flexible Target Key Transformation, Reusable Filter, Reusable Stored Procedure, Reusable HTTP, Reusable XML Parser, Reusable WebServices Consumer, Reusable Lookup, Reusable Expression  
Description Description  
Name Name  
ConnectionDataAttribute Column, Mapping Variable, Mapplet Port  
Description Description  
Name Name  
ConnectionDataSet Table, Mapplet Input, Mapplet Output, Mapping Variables, File, Mapplet Group  
Description Description  
Name Name  
ConnectionNamespace Directory  
Name Name  
ConnectionPackage Package  
Description Description  
Name Name  
CustomTransformation Java, XML Generator, Flexible Target Key Transformation, Custom Transformation, HTTP, SAP BAPI Transformation, XML Parser, Data Masking  
Description Description  
Name Name  
DiModel Mapping  
Description Description  
Name Name  
ExpressionTransformation Expression  
Description Description  
Name Name  
FilteringTransformation Filter, Router  
Description Description  
Name Name  
GenericConnectedTransformation Stored Procedure, WebServices Consumer, SQL  
ConnectionName Connection Name  
Description Description  
Name Name  
GenericTransformation External Procedure, SQL Lookup, Normalizer, Update Strategy, Rank, Transaction Control  
Description Description  
Name Name  
JoinTransformation Joiner  
Description Description  
Name Name  
SortedInput Sorted Input  
LookupTransformation Lookup  
ConnectionName Connection Name  
Description Description  
Name Name  
ReaderTransformation Source Qualifier  
ConnectionName Connection Name  
Description Description  
Name Name  
SequenceGeneratorTransformation Sequence  
Description Description  
Name Name  
SortingTransformation Sorter  
Description Description  
Name Name  
StoreConnection Connection  
Description Description  
Name Name  
SystemType Type  
TransformationDataAttribute Lookup Port, Normalizer Port, Sorter Port, Output Transformation Port, Expression Port, Passthrough Transformation Port, Http Port, Data Masking Port, Rank Port, Transformation Port  
Description Description  
Name Name  
TransformationDataSet Group, Router Group  
Name Name  
UnionTransformation Union  
Description Description  
Name Name  
WriterTransformation Target  
ConnectionName Connection Name  
Description Description  
Name Name  

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (File)"
Metamodel
Informatica PowerCenter (Data Store)
Mapping Comments
     
Attribute Fixed Width Column, Column, Delimited Column  
Description Description  
Name Name  
Position Position, Column Offset  
Class Table  
Description Description  
Name Name  
DataAttribute Data Attribute  
Name Name  
Position Position  
DatabaseSchema Schema  
Name Name  
FileDirectory Directory  
Name Name  
FlatTextFile Delimited File, Fixed Width File  
Description Description  
Name Name  
StoreModel Data Store  
Name Name  
SystemType Type  
StoredProcedure Stored Procedure  
Name Name  

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (File)"
Metamodel
Informatica PowerCenter (Workflow)
Mapping Comments
     
ContainerStep Embedded Worklet  
Name Name  
DiModel Workflow  
Name Name  
EmbeddedCallStep Workflow Instance, Worklet, Session  
Name Name  
StartStep Start  
Name Name