Informatica PowerCenter (Repository) - Import - Cloud

Talend Cloud Data Catalog Bridges

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Data Catalog

Note: This bridge is not available in Talend Cloud Data Catalog by default. You need to set up a remote harvesting agent to import metadata from this source. For more information, see Deploying a remote harvesting agent in Talend Cloud Data Catalog Administration guide.

Bridge Requirements

This bridge:
  • requires the tool to be installed to access its SDK.

Bridge Specifications

Vendor Informatica
Tool Name PowerCenter
Tool Version 8.x to 10.x
Tool Web Site http://www.informatica.com/products_services/powercenter/Pages/index.aspx
Supported Methodology [Data Integration] Multi-Model, Metadata Repository, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Command line API on Repository based XML File
Data Profiling
Remote Repository Browsing for Model Selection
Incremental Harvesting
Multi-Model Harvesting

SPECIFICATIONS
Tool: Informatica / PowerCenter version 8.x to 10.x via Command line API on Repository based XML File
See http://www.informatica.com/products_services/powercenter/Pages/index.aspx
Metadata: [Data Integration] Multi-Model, Metadata Repository, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing)
Component: InformaticaPowerCenterRepository version 11.0.0

OVERVIEW
Imports directly from Informatica PowerCenter Repository.

REQUIREMENTS
Before attempting to import from the Informatica PowerCenter Repository using this bridge please do the following:
1. Confirm that the Informatica client software is installed on the machine where the bridge is to be run
2. Use the username and password intended for the bridge to open Informatica Designer and view the specific metadata to be imported.

FREQUENTLY ASKED QUESTIONS
Q: What PowerCenter transformations are supported by this bridge?
A: The bridge currently supports these primary transformations: Readers, Writers, Expressions, Aggregations, Lookups, Joins, and Filters. Any others will be undefined.

Q: How do I provide metadata to the support team to reproduce an issue?
A: Create a backup of the Informatica PowerCenter metadata you are trying to import by using the pmrep command on the command line on a machine that has a full client installation of Informatica PowerCenter and is able to connect to the repository. Open a command line window on the app server and go to the directory where pmrep.exe is (same as for the 'Path to the Informatica binary files' parameter for the bridge)

- Enter the following commands:
- pmrep.exe
- connect -r Repo_name -n user_name -x password -h host -o port
where these variables match the entries in the bridge for the PowerCenter Repository connection
- backup -o c:\temp\InfaBackup.dat

The backup file is written to the Informatica PowerCenter Server and may be retrieved there.

LIMITATIONS
Refer to the current general known limitations at http://metaintegration.com/Products/MIMB/MIMBKnownLimitations.html or bundled in Documentation/ReadMe/MIMBKnownLimitations.html
Note about completeness of the Metadata Import for Lineage Presenation.

When one asks for a lineage of a data warehouse column one expects to get an answer reflecting the run-time reality of the whole ETL process (execution plan). The process can involve multiple source systems, multiple phases and multiple increments.

In the case of Informatica, an ETL process executes workflows. A workflow can be executed many time with different parameters. Not all types of parameters affect lineage. Connectivity or SQL overwrite parameters that change sources or targets affect lineage. Date parameters that control incremental loading do not affect lineage.

The Informatica repository stores workflows in folders. The Informatica folder structure can be aligned with execution process structure but does not have to.

It is recommended that one should import Informatica metadata according to the ETL execution process. In order to do so, please consult with the data warehouse/ETL architect/administrator to find out a list of all workflow and corresponding parameters that make up the whole ETL process.

Included in this analysis may be:

- multiple source systems - a single-source execution plan extracts data from a single instance of a single-source system. A multi-source execution plan extracts data from multiple instances of the same source system (Homogeneous), or multiple instances of dissimilar source systems (Heterogeneous). For example, a business might have an instance of ERP in one location and time zone and another instance of the same ERP in another location and time zone (Homogeneous). Or a business might have an instance of CRM in one location, an instance of ERP in another location, and a second instance of the ERP in yet a third location (Heterogeneous).

- multiple phases - extraction, loading, post-proccesing

- multiple increments - full, incremental and micro loads

SUPPORT
Provide a trouble shooting package with:
- the debug log (can be set in the UI or in conf/conf.properties with MIR_LOG_LEVEL=6)
- the metadata backup if available (can be set in the Miscellaneous parameter with option -backup)
Q: What is the cause of the error 'Failed to spawn process ...\pmrep.exe'?
A: While there are several possibilities for this error, especially configuration issues already discussed, one additional possibility is that for very large folders in the Informatica Repository the utility (pmrep.exe) may not be able to take advantage of the sufficient memory because it is a 32-bit version, if the bridge is running in a Windows environment. On the client side, Informatica always packages a 32-bit pmrep only, and on the server side always a 64-bit pmrep whether its windows or Linux.

If you need 64 bit pmrep for windows, please take it from below locations in the 64 bit installation:
Server :
tools/pcutils (64 bit)
Client :
source\clients\DeveloperClient\pcutils\[version] (64 bit)


Bridge Parameters

Parameter Name Description Type Values Default Scope
Informatica Domain Informatica Domain Name. Not applicable for Informatica PowerCenter before 8.0.For connecting to the PowerCenter 7.x or earlier use host name and port number instead and keep this parameter empty. STRING      
Informatica Security Domain Informatica Security Domain Name. Not applicable for Informatica PowerCenter before 8.0.For connecting to the PowerCenter 7.x or earlier use host name and port number instead and keep this parameter empty. STRING   Native  
Gateway Host Name or Address Optional for Informatica PowerCenter 8.0 or later, but is required for all versions before 8.0.

For PowerCenter 8.0 and later enter the Informatica Gateway node address. If this parameter is not specified, Informatica will try to locate this address automatically.

For PowerCenter 7.x and earlier enter the Informatica PowerCenter Repository Server address.
STRING      
Gateway Port Number Optional for the Informatica PowerCenter 8.0 or later but is required for all versions before 8.0.

For PowerCenter 8.0 and later enter the Informatica Gateway node port number. If this parameter is not specified, Informatica will try to locate the port number automatically.

For PowerCenter 7.x and earlier enter the Informatica PowerCenter Repository Server port number.
NUMERIC      
Repository Name For PowerCenter 8.0 and later enter the Repository Service name.

For PowerCenter 7.x and earlier enter the Informatica PowerCenter Repository name.
STRING     Mandatory
Repository User Name Enter the Informatica PowerCenter Repository user name. If not specified, default anonymous access is used to read and import repository objects. If this parameter is specified, a matching password must be specified as well. The user must have 'read' permission on PowerCenter folder(s) containing the metadata to import. Also the user must have 'Access Repository Manager' privilege to access the PowerCenter repository content. STRING   Administrator  
Repository User Password Enter the Informatica PowerCenter Repository user password. This parameter is required only if Repository user name is specified. The password can be either plain text or encrypted using Informatica 'pmpasswd' utility. Be sure to set the 'Repository User Password is Encrypted' bridge parameter to tell the system what type of password is used.

This import bridge is warrantied to be 'read only' and will never affect the Repository contents. It is therefore safe to attempt the initial metadata harvesting as 'Administrator' in order to ensure that the entire Repository content is extracted without any access permission issue. Eventually, the administrator can set up a 'read only' user.

Refer to the tool documentation on permissions and security for more details.
PASSWORD      
Repository User Password is Encrypted Set this option to 'True' if you are using an encrypted password based upon the Informatica PowerCenter Repository 'pmpasswd' utility for extra security. Otherwise set this parameter to 'false' BOOLEAN
False
True
False  
Path to PMREP The path to the PMREP application file installed with a PowerCenter client. The client must be compatible with the source PowerCenter Repository server, e.g.
'c:\Informatica\9.6.1\clients\PowerCenterClient\client\bin\pmrep.exe'
'etc/Informatica/bin/pmrep'
FILE *.exe   Mandatory
Folder Name Filter Enter the name of the 'filter' folders in the Informatica PowerCenter Repository. Use if you need to limit the scope of repository structure browsing. If specifified, the bridge will only look in these folders and any children for metadata to import. REPOSITORY_SUBSET      
Object Types Select the metadata object type(s) to import.

'Sources only'
Source objects only

'Targets only'
Target objects only

'Workflows only'
Workflow objects only

'Mappings, sources and targets'
Source objects, target objects and mappings

'All supported objects'
All objects contains in a folder which are imported by this bridge.
ENUMERATED
Sources only
Targets only
Mappings, sources and targets
Workflows only
All supported objects
All supported objects  
Repository subset Enter a semicolon separated list of Informatica PowerCenter Repository object paths. A path is defined as folder_name/object_type/object_name where:

- folder_name is the PowerCenter folder name. The folder_name may be quoted or double quoted if it includes embedded spaces.
- object_type is one of the supported PowerCenter types: 'source', 'target', 'mapping' or 'workflow'.
- object_name is the PowerCenter repository object name. Use an asterisk (*) if you need all objects of specified type.

E.g.
'Folder1'/source/account
Import source table with name 'account' from the folder with name 'Folder1'

'Folder1'/mapping/*
Import all mappings from the folder with name 'Folder1'

'Folder1'/source/*;'Folder2'/target/customer
Import all source tables from the folder with name 'Folder1' and one target table with the name 'customer' from folder 'Folder2'.
REPOSITORY_SUBSET      
Parameter files directory Directory containing Informatica PowerCenter parameters file(s). For detailed information about parameters files, please see the product documentation.

Informatica uses substitution parameters that are substituted with values at ETL execution (runtime). Just like Informatica itself, the bridge looks for these values in parameter files available to the bridge. This bridge parameter should be given a directory name, which is the root directory for all of these parameters files.

If the bridge cannot find a name=value pair for a given substitution parameter, the bridge will generally fail to parse the metadata correctly and report warnings and errors accordingly. It will also report the name of the undefined substitution parameter.
There are several ways in which to define substitution parameters and place the parameter files in the directory referred to here:
- If you are importing one workflow that uses a parameter file please name it 'parameters.prm' and place the file in the directory.

- If you are importing multiple workflows that re-use the same parameter file please name it 'parameters.prm' and place it in the directory.

- If you are importing multiple workflows that use different parameter files please place these files under the directory in sub-directories. Each parameter file has a name of the workflow that uses it (with extension '.prm') and is placed in a sub-directory that has a name of the workflow's repository folder.

Path Prefix:
You can use a special substitution parameter (look at the variable definition section below) '$Static_Directory_Prefix@@' to prefix any relative paths for your parameter files. For example, if your session refers to a parameter file 'folder/subfolder/param.txt', then when this variable is defined, MIMB will prefix the value to the relative path and try to find the parameter file. You can also use this special substitution parameter to resolve any files, only if your Informatica server is running on Unix and the parameter files are based on Unix absolute paths. For example, if your parameter files are being referred to as '/opt/params/param.txt', then you can create this directory structure on the windows machine and specify the top directory as the value for the special substitution parameter '$Static_Directory_Prefix@@'.

Groups: One may place group headers in the parameters.prm file in order to specify context for a name=value pair. Examples of groups:
[Global] - applies to all objects in the import.
[folder name.WF:workflow name.ST:session name] - applies to a specified session task in the specified workflow.
[folder name.WF:workflow name.WT:worklet name.ST:session name] - applies to a specified session tasks from a specified worklet in a specified workflow. If session path has more than one worklet, use additional '.WT:worklet' constructs.
[folder name.session name] - applies to all sessions in the specified folder.
[folder name.workflow name] - applies to all workflows in the specified folder.
[session name] - applies to all sessions with specified name.

Examples of global vs. local group context:
- Defines source connection 'src1' name as 'customer_source_DB' for all imported objects:
[Global]
$DBConnection_src1=customer_source_DB

- Defines variable 'MyVar' value for session task 'session1' in the worklet 'task1' of the workflow 'WF1' in the folder 'Folder1':
[Folder1.WF:WF1.WT:task1.ST:session1]
$$MyVar=TEST

The bridge looks for substitution parameter values in the following order:
1.) If a session in Informatica actually has defined a specific substitution parameter pathname, then the bridge first looks for that file and if found looks for that substitution parameter name and value
2.) Otherwise, if a workflow in Informatica actually has defined a specific substitution parameter pathname, then the bridge first looks for that file and if found looks for that substitution parameter name and value
3.) Otherwise, if there is a pathname in the directory specified here that matches the name of the workflow, the bridge then looks for that substitution parameter name and value. It works its way 'up' the directory stucture (to a more general context) until it finds that substitution parameter name and value. One may specify a group in this file to apply name=value pairs to specific sessions.
4.) Otherwise, if there is still not value assigned, the bridge looks in the parameters.prm file in the directory specified here for the substitution parameter name and value.

When the bridge reports that a particular substitution parameter value is not found, this situation may occur when either:
- You have not collected all of the proper parameter files used by Informatica when executing ETL
- There are additional substitution parameter assignments made globally through environment variables to Informatica when executing ETL.

Obviously, if it is the first case, you should obtain the correct set of parameter files and not try to reproduce these assignments by hand. However, in the second case, here is the process one should follow to address this situation:
1.) Add the substitution parameter name=value pair to the parameters.prm file in the directory specified here. This value will then apply globally but be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
2.) If you need to provide different values depending upon the context (say different workflows have different substitution values), add the substitution parameter name=value pair in a group (see above). This value will then apply only to the context defined in the group header, but again will be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
3.) If you need to provide different values depending upon the context (say different workflows have different substitution values), you may also add the substitution parameter name=value pair in a file in the sub-directory structure for that context. Again, this value will then apply only to the context where you placed this name=value pair, however it is given precedence over any name=value pairs defined in a group in the root parameters.prm file. Hence, it will address the missing substitution parameter issue and be the value substituted, unless the session in Informatica was defined with a specific pathname for the substitution parameters.
4.) There should be no need to update the files at pathnames defined within Informatica for specific sessions, as these should be collected properly and made available to the bridge.

Connection types:
You can define target DB types for the a connection (that is not properly defined in Informatica) with a substitution parameter name like:
Connection.[name].DBTYPE

For example if a connection with the name 'ODBC_Connections' is assigned the Oracle database type at runtime, you can use the 'DBTYPE' name=value pair (in this case the bridge will know that 'ODBC_Connection' is of type Oracle and will use the proper handler to parse its metadata):
Connection.ODBC_Connection.DBTYPE=ORACLE

The database type is case insensitive. The list of possible values: Oracle,ODBC,Netezza,Microsoft SQL Server,DB2,Sybase,Informix,SAP BW, SAP R/3,Teradata.

Aliases:
When you know that two connections with different names target the same data in a database, one may use an 'ALIAS' to tell the bridge to treat them as the same data source in the lineage like this:
Connection.ODBC_Connection.ALIAS=oracleDB
Connection.Oracle_Connection.ALIAS=oracleDB
with these definitions, the names of the specified connections will be replaced with 'oracleDB' on runtime and linages will be computed accordingly.

Schemas:
You can override default schema for connection with 'SCHEMA':
Connection.DB_Conn.SCHEMA=dbo
empty schema for DB_Conn will be replaced with 'dbo' in this case.
DIRECTORY      
Cache control If true, the bridge will get changes from the last start. Otherwise all caches will be dropped and skipped. BOOLEAN
False
True
False  
Miscellaneous INTRODUCTION
Specify miscellaneous options starting with a dash and optionally followed by parameters, e.g.
-connection.cast MyDatabase1="SQL Server"
Some options can be used multiple times if applicable, e.g.
-connection.rename NewConnection1=OldConnection1 -connection.rename NewConnection2=OldConnection2;
As the list of options can become a long string, it is possible to load it from a file which must be located in ${MODEL_BRIDGE_HOME}\data\MIMB\parameters and have the extension .txt. In such case, all options must be defined within that file as the only value of this parameter, e.g.
ETL/Miscellaneous.txt

JAVA ENVIRONMENT OPTIONS
-m <Java Memory's maximum size>
1G by default on 64bits JRE or as set in conf/conf.properties, e.g.
-m 8G
-m 8000M

-j <Java Runtime Environment command line options>
This option must be the last one in the Miscellaneous parameter as all the text after -j is passed "as is" to the JRE, e.g.
-j -Dname=value -Xms1G
The following option must be set when a proxy is used to access internet (this is critical to access https://repo.maven.apache.org/maven2/ (and exceptionally a few other tool sites) in order to download the necessary third party software libraries.
-j -Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=3128 -Dhttps.proxyHost=127.0.0.1 -Dhttps.proxyPort=3128 -Dhttp.proxyUser=user -Dhttp.proxyPassword=pass -Dhttps.proxyUser=user -Dhttps.proxyPassword=pass

-jre <Java Runtime Environment full path name>
It can be an absolute path to javaw.exe on Windows or a link/script path on Linux, e.g.
-jre "c:\Program Files\Java\jre1.8.0_211\bin\javaw.exe"

-v <Environment variable value>
None by default, e.g.
-v var1=value1 -v var2="value2 with spaces"

MODEL IMPORT OPTIONS
-model.name <model name>
Override the model name, e.g.
-model.name "My Model Name"

-prescript <script name>
The script must be located in the bin directory, and have .bat or .sh extension.
The script path must not include any parent directory symbol (..).
The script should return exit code 0 to indicate success, or another value to indicate failure.
For example:
-prescript "script.bat arg1 arg2"

-cache.clear
Clears the cache before the import, and therefore will run a full import without incremental harvesting.
Warning: this is a system option managed by the application calling the bridge and should not be set by users.

-backup <directory>
Full path of an empty directory to save the metadata input files for further troubleshooting.

DATA CONNECTION OPTIONS
Data Connections are produced by the import bridges typically from ETL/DI and BI tools to refer to the source and target data stores they use. These data connections are then used by metadata management tools to connect them (metadata stitching) to their actual data stores (e.g. databases, file system, etc.) in order to produce the full end to end data flow lineage and impact analysis. The name of each data connection is unique by import model. The data connection names used within DI/BI design tools are used when possible, otherwise connection names are generated to be short but meaningful such as the database / schema name, the file system path, or Uniform Resource Identifier (URI). The following options allows to manipulate connections. These options replaces the legacy options -c, -cd, and -cs.

-connection.cast ConnectionName=ConnectionType
Casts a generic database connection (e.g. ODBC/JDBC) to a precise database type (e.g. ORACLE) for SQL Parsing, e.g.
-connection.cast "My Database"="SQL SERVER".
The list of supported data store connection types includes:
ACCESS
CASSANDRA
DB2
DENODO
HIVE
MYSQL
NETEZZA
ORACLE
POSTGRESQL
PRESTO
REDSHIFT
SALESFORCE
SAP HANA
SNOWFLAKE
SQL SERVER
SYBASE
TERADATA
VECTORWISE
VERTICA

-connection.rename OldConnection=NewConnection
Renames an existing connection to a new name, e.g.
-connection.rename OldConnectionName=NewConnectionName
Multiple existing database connections can be renamed and merged into one new database connection, e.g.
-connection.rename MySchema1=MyDatabase -connection.rename MySchema2=MyDatabase

-connection.split oldConnection.Schema1=newConnection
Splits a database connection into one or multiple database connections.
A single database connection can be split into one connection per schema, e.g.
-connection.split MyDatabase
All database connections can be split into one connection per schema, e.g.
-connection.split *
A database connection can be explicitly split creating a new database connection by appending a schema name to a database, e.g.
-connection.split MyDatabase.schema1=MySchema1

-connection.map DestinationPath=SourcePath
Maps a source path to destination path. This is useful for file system connections when different paths points to the same object (directory or file).
On Hadoop, a process can write into a CSV file specified with the HDFS full path, but another process reads from a HIVE table implemented (external) by the same file specified using a relative path with default file name and extension, e.g.
-connection.map hdfs://host:8020/users/user1/folder/file.csv=/user1/folder
On Linux, a given directory (or file) like /data can be referred to by multiple symbolic links like /users/john and /users/paul, e.g.
-connection.map /users/John=/data -connection.map /users/paul=/data
On Windows, a given directory like C:\data can be referred to by multiple network drives like M: and N:, e.g.
-connection.map M:\=C:\data -connection.map N:\=C:\data

-pppd
Create the connections and connection data sets in DI/ETL design models. This feature should only be used when intending to export to another DI/ETL tool.

INFORMATICA POWERCENTER OPTIONS
-e
Allow any parameter file extension. Otherwise, only .TXT and .PRM file extensions for 'Parameter files directory' are supported.

-i
Remove illegal XML characters.

-l <path>
Set INFA_HOME_PATH environment variable to the directory where PMREP is installed.

-p <file path>
The file path to XML or directory imported by the bridge from the Informatica tool. Used for debug purposes.

-f
Use this option if the bridge should import each folder separately.

-cp <string>
Specify the code page value to use when the INFA_CODEPAGENAME environment variable is not defined. Please reference the Informatica documentation for details about supported code page values of the variable. E.g.
-cp UTF-8

-cd <string>
Split or merge file system connections by a directory path.
Useful to split a connection into two or more root folders, that can be stitched to two or more file stores. To split the connection 'orig_con' into two connections: 'a_con' based upon the 'a' folder and the rest remains in the orig_con connection:
-cd a_con=orig_con:/a

To split the connection 'orig_con' into two connections: 'a_con" based upon the 'root/a' folder:
-cd a_con=orig_con:/root/a -

Useful to merge several connections into one. To merge two connections 'C:' with a root folder 'R1' and 'B:' with a root folder 'R2' into one connection 'C:' with two root folders - 'R1' and 'R2':
-cd C:\=B:\
STRING      

 

Bridge Mapping

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (Repository)"
Metamodel
InformaticaPowerCenterRepository
Mapping Comments
     
Association SourceField, TargetField Built from Foreign Key information
Aggregation KeyType, Nullable  
AssociationRole SourceField, TargetField  
Multiplicity   Based on Source information and KeyType attribute
Source   Based on Foreign Key information
AssociationRoleNameMap SourceField, TargetField  
Attribute SourceField, TargetField  
Description Description  
Name BusinessName Set to Name if BusinessName is not set
Optional Nullable  
PhysicalName Name Attribute Physical Name.Computed from 'name' if not set
Position FieldNumber  
BaseType SourceField, TargetField  
DataType Datatype See datatype conversion array
Name   Based on the datatype
PhysicalName   Derived from the datatype
CandidateKey SourceField, TargetField, TargetIndex  
UniqueKey KeyType, TargetIndex.Unique  
Class Source, Target  
CppClassType   Set to ENTITY
CppPersistent   Set to True
Description Description  
DimensionalRole dimensional Role metadata extension  
DimensionalType dimensional Role Type metadata extension  
Name BusinessName Set to Name if BusinessName is not set
PhysicalName Name Class Physical Name.Computed from the 'name' if not set
ClassifierMap Transformation, Instance  
Name Name  
DataAttribute TransformField  
Description Description  
Name Name  
DataSet Transformation, Instance  
Name Name  
DatabaseSchema Source, Target Tables and views belonging to the owner
Name DBD name or connection name DBD name is for sources only. Connection name if present.
DerivedType SourceField, TargetField  
DataType Datatype See datatype conversion array
Length Precision  
Name Datatype See datatype conversion array
PhysicalName   Derived from the datatype
Scale Scale  
UserDefined   Set to False
DesignPackage Repository, Folder, FolderVersion, Source, Target A SOURCE and a TARGET packages are created to hold the source sub-packages and the target Classes respectively. Packages are also created for Mappings and Workflows.
Description Description  
Name Name, Source.DBDName Name, Source.DBDName
UserDefined   Set to True
FeatureMap TransformField  
Name Name  
Operation Expression  
ForeignKey SourceField, TargetField  
Index TargetIndex  
Name Name  
IndexMember TargetIndexField  
Position   position in XML file
StoreModel PowerCenter file  
Name   Based on the file name