Informatica PowerCenter (Repository) - Import - 7.1

Talend Data Catalog Bridges

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Data Catalog

Bridge Requirements

This bridge:
  • requires the tool to be installed to access its SDK.

Bridge Specifications

Vendor Informatica
Tool Name PowerCenter
Tool Version 8.x to 10.x
Tool Web Site http://www.informatica.com/products_services/powercenter/Pages/index.aspx
Supported Methodology [Data Integration] Multi-Model, Metadata Repository, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Command line API on Repository based XML File
Data Profiling
Multi-Model Harvesting
Incremental Harvesting
Remote Repository Browsing for Model Selection

BRIDGE INFORMATION
Import tool: Informatica PowerCenter 8.x to 10.x (http://www.informatica.com/products_services/powercenter/Pages/index.aspx)
Import interface: [Data Integration] Multi-Model, Metadata Repository, ETL (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Command line API on Repository based XML File from Informatica PowerCenter (Repository)
Import bridge: 'InformaticaPowerCenterRepository' 10.1.0

BRIDGE DOCUMENTATION
Imports directly from Informatica PowerCenter Repository.
Before attempting to import from the Informatica PowerCenter Repository using this bridge please do the following:
1. Confirm that the Informatica client software is installed on the machine where the bridge is to be run
2. Use the username and password intended for the bridge to open Informatica Designer and view the specific metadata to be imported.


Note about completeness of the Metadata Import for Lineage Presenation.

When one asks for a lineage of a data warehouse column one expects to get an answer reflecting the run-time reality of the whole ETL process (execution plan). The process can involve multiple source systems, multiple phases and multiple increments.

In the case of Informatica, an ETL process executes workflows. A workflow can be executed many time with different parameters. Not all types of parameters affect lineage. Connectivity or SQL overwrite parameters that change sources or targets affect lineage. Date parameters that control incremental loading do not affect lineage.

The Informatica repository stores workflows in folders. The Informatica folder structure can be aligned with execution process structure but does not have to.

It is recommended that one should import Informatica metadata according to the ETL execution process. In order to do so, please consult with the data warehouse/ETL architect/administrator to find out a list of all workflow and corresponding parameters that make up the whole ETL process.

Included in this analysis may be:

- multiple source systems - a single-source execution plan extracts data from a single instance of a single-source system. A multi-source execution plan extracts data from multiple instances of the same source system (Homogeneous), or multiple instances of dissimilar source systems (Heterogeneous). For example, a business might have an instance of ERP in one location and time zone and another instance of the same ERP in another location and time zone (Homogeneous). Or a business might have an instance of CRM in one location, an instance of ERP in another location, and a second instance of the ERP in yet a third location (Heterogeneous).

- multiple phases - extraction, loading, post-proccesing

- multiple increments - full, incremental and micro loads


FREQUENTLY ASKED QUESTIONS
Q: What PowerCenter transformations are supported by this bridge? A: The bridge currently supports these primary transformations: Readers, Writers, Expressions, Aggregations, Lookups, Joins, and Filters. Any others will be undefined.

Q: How do I provide metadata to the support team to reproduce an issue?

A: Create a backup of the Informatica PowerCenter metadata you are trying to import by using the pmrep command on the command line on a machine that has a full client installation of Informatica PowerCenter and is able to connect to the repository.

- Open a command line window on the app server and go to the directory where pmrep.exe is (same as for the 'Path to the Informatica binary files' parameter for the bridge)

- Enter the following commands:
- pmrep.exe
- connect -r Repo_name -n user_name -x password -h host -o port
where these variables match the entries in the bridge for the PowerCenter Repository connection
- backup -o c:\temp\InfaBackup.dat

The backup file is written to the Informatica PowerCenter Server and may be retrieved there.


Q: What is the cause of the error 'Failed to spawn process ...\pmrep.exe'?
A: While there are several possibilities for this error, especially configuration issues already discussed, one additional possibility is that for very large folders in the Informatica Repository the utility (pmrep.exe) may not be able to take advantage of the sufficient memory because it is a 32-bit version, if the bridge is running in a Windows environment. On the client side, Informatica always packages a 32-bit pmrep only, and on the server side always a 64-bit pmrep whether its windows or Linux.

If you need 64 bit pmrep for windows, please take it from below locations in the 64 bit installation:
Server :
tools/pcutils (64 bit)
Client :
source\clients\DeveloperClient\pcutils\[version] (64 bit)


Bridge Parameters

Parameter Name Description Type Values Default Scope
Informatica Domain Informatica Domain Name. This parameter is not applicable for Informatica PowerCenter before 8.0.For connecting to the PowerCenter 7.x or earlier use host name and port number instead and keep this parameter empty. STRING      
Informatica Security Domain Informatica Security Domain Name. This parameter is not applicable for Informatica PowerCenter before 8.0.For connecting to the PowerCenter 7.x or earlier use host name and port number instead and keep this parameter empty. STRING   Native  
Gateway Host Name or Address This parameter is optional for Informatica PowerCenter 8.0 or later, but is required for all versions before 8.0.

For PowerCenter 8.0 and later enter the Informatica Gateway node address. If this parameter is not specified, Informatica will try to locate this address automatically.

For PowerCenter 7.x and earlier enter the Informatica PowerCenter Repository Server address.
STRING      
Gateway Port Number This parameter is optional for the Informatica PowerCenter 8.0 or later but is required for all versions before 8.0.

For PowerCenter 8.0 and later enter the Informatica Gateway node port number. If this parameter is not specified, Informatica will try to locate the port number automatically.

For PowerCenter 7.x and earlier enter the Informatica PowerCenter Repository Server port number.
NUMERIC      
Repository Name For PowerCenter 8.0 and later enter the Repository Service name.

For PowerCenter 7.x and earlier enter the Informatica PowerCenter Repository name.
STRING     Mandatory
Repository User Name Enter the Informatica PowerCenter Repository user name. If this parameter is not specified, default anonymous access is used to read and import repository objects. If this parameter is specified, a matching password must be specified as well. The user must have 'read' permission on PowerCenter folder(s) containing the metadata to import. Also the user must have 'Access Repository Manager' privilege to access the PowerCenter repository content. STRING   Administrator  
Repository User Password Enter the Informatica PowerCenter Repository user password. This parameter is required only if Repository user name is specified. The password can be either plain text or encrypted using Informatica 'pmpasswd' utility. Be sure to set the 'Repository User Password is Encrypted' bridge parameter to tell the system what type of password is used.

This import bridge is warrantied to be 'read only' and will never affect the Repository contents. It is therefore safe to attempt the initial metadata harvesting as 'Administrator' in order to ensure that the entire Repository content is extracted without any access permission issue. Eventually, the administrator can set up a 'read only' user.

Refer to the tool documentation on permissions and security for more details.
PASSWORD      
Repository User Password is Encrypted Set this option to 'True' if you are using an encrypted password based upon the Informatica PowerCenter Repository 'pmpasswd' utility for extra security. Otherwise set this parameter to 'false' BOOLEAN
False
True
False  
Path to PMREP The path to the PMREP application file installed with a PowerCenter client. The client must be compatible with the source PowerCenter Repository server.

Examples:
'c:\Informatica\9.6.1\clients\PowerCenterClient\client\bin\pmrep.exe'
'etc/Informatica/bin/pmrep'
FILE *.exe   Mandatory
Folder Name Filter Enter the name of the 'filter' folders in the Informatica PowerCenter Repository. Use this parameter if you need to limit the scope of repository structure browsing. If specifified, the bridge will only look in these folders and any children for metadata to import. REPOSITORY_SUBSET      
Object Types Select the metadata object type(s) to import.
'Sources only' - Source objects only
'Targets only' - Target objects only
'Workflows only' - Workflow objects only
'Mappings, sources and targets' - Source objects, target objects and mappings
'All supported objects' - All objects contains in a folder which are imported by this bridge.
ENUMERATED
Sources only
Targets only
Mappings, sources and targets
Workflows only
All supported objects
All supported objects  
Repository subset Enter a semicolon separated list of Informatica PowerCenter Repository object paths. A path is defined as folder_name/object_type/object_name where:
folder_name is the PowerCenter folder name. The folder_name may be quoted or double quoted if it includes embedded spaces.
object_type is one of the supported PowerCenter types: 'source', 'target', 'mapping' or 'workflow'.
object_name is the PowerCenter repository object name. Use an asterisk (*) if you need all objects of specified type.

Examples: 'Folder1'/source/account - import source table with name 'account' from the folder with name 'Folder1'
'Folder1'/mapping/* - import all mappings from the folder with name 'Folder1'
'Folder1'/source/*;'Folder2'/target/customer - import all source tables from the folder with name 'Folder1' and one target table with the name 'customer' from folder 'Folder2'.
REPOSITORY_SUBSET      
Parameter files directory Directory containing Informatica PowerCenter parameters file(s). For detailed information about parameters files, please see the product documentation.


Informatica uses substitution parameters that are substituted with values at ETL execution (runtime). Just like Informatica itself, the bridge looks for these values in parameter files available to the bridge. This bridge parameter should be given a directory name, which is the root directory for all of these parameters files.

If the bridge cannot find a name=value pair for a given substitution parameter, the bridge will generally fail to parse the metadata correctly and report warnings and errors accordingly. It will also report the name of the undefined substitution parameter.
There are several ways in which to define substitution parameters and place the parameter files in the directory referred to here:
- If you are importing one workflow that uses a parameter file please name it 'parameters.prm' and place the file in the directory.

- If you are importing multiple workflows that re-use the same parameter file please name it 'parameters.prm' and place it in the directory.

- If you are importing multiple workflows that use different parameter files please place these files under the directory in sub-directories. Each parameter file has a name of the workflow that uses it (with extension '.prm') and is placed in a sub-directory that has a name of the workflow's repository folder.


Path Prefix:
You can use a special substitution parameter (look at the variable definition section below) '$Static_Directory_Prefix@@' to prefix any relative paths for your parameter files. For example, if your session refers to a parameter file 'folder/subfolder/param.txt', then when this variable is defined, MIMB will prefix the value to the relative path and try to find the parameter file. You can also use this special substitution parameter to resolve any files, only if your Informatica server is running on Unix and the parameter files are based on Unix absolute paths. For example, if your parameter files are being referred to as '/opt/params/param.txt', then you can create this directory structure on the windows machine and specify the top directory as the value for the special substitution parameter '$Static_Directory_Prefix@@'.

Groups: One may place group headers in the parameters.prm file in order to specify context for a name=value pair. Examples of groups:
[Global] - applies to all objects in the import.
[folder name.WF:workflow name.ST:session name] - applies to a specified session task in the specified workflow.
[folder name.WF:workflow name.WT:worklet name.ST:session name] - applies to a specified session tasks from a specified worklet in a specified workflow. If session path has more than one worklet, use additional '.WT:worklet' constructs.
[folder name.session name] - applies to all sessions in the specified folder.
[folder name.workflow name] - applies to all workflows in the specified folder.
[session name] - applies to all sessions with specified name.

Examples of global vs. local group context:
- Defines source connection 'src1' name as 'customer_source_DB' for all imported objects:
[Global]
$DBConnection_src1=customer_source_DB

- Defines variable 'MyVar' value for session task 'session1' in the worklet 'task1' of the workflow 'WF1' in the folder 'Folder1':
[Folder1.WF:WF1.WT:task1.ST:session1]
$$MyVar=TEST


The bridge looks for substitution parameter values in the following order:
1.) If a session in Informatica actually has defined a specific substitution parameter pathname, then the bridge first looks for that file and if found looks for that substitution parameter name and value
2.) Otherwise, if a workflow in Informatica actually has defined a specific substitution parameter pathname, then the bridge first looks for that file and if found looks for that substitution parameter name and value
3.) Otherwise, if there is a pathname in the directory specified here that matches the name of the workflow, the bridge then looks for that substitution parameter name and value. It works its way 'up' the directory stucture (to a more general context) until it finds that substitution parameter name and value. One may specify a group in this file to apply name=value pairs to specific sessions.
4.) Otherwise, if there is still not value assigned, the bridge looks in the parameters.prm file in the directory specified here for the substitution parameter name and value.


When the bridge reports that a particular substitution parameter value is not found, this situation may occur when either:
- You have not collected all of the proper parameter files used by Informatica when executing ETL
- There are additional substitution parameter assignments made globally through environment variables to Informatica when executing ETL.

Obviously, if it is the first case, you should obtain the correct set of parameter files and not try to reproduce these assignments by hand. However, in the second case, here is the process one should follow to address this situation:
1.) Add the substitution parameter name=value pair to the parameters.prm file in the directory specified here. This value will then apply globally but be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
2.) If you need to provide different values depending upon the context (say different workflows have different substitution values), add the substitution parameter name=value pair in a group (see above). This value will then apply only to the context defined in the group header, but again will be overridden if that same substitution parameter is defined in a narrower context. Hence, it will address the missing substitution parameter issue but not disturb already defined values.
3.) If you need to provide different values depending upon the context (say different workflows have different substitution values), you may also add the substitution parameter name=value pair in a file in the sub-directory structure for that context. Again, this value will then apply only to the context where you placed this name=value pair, however it is given precedence over any name=value pairs defined in a group in the root parameters.prm file. Hence, it will address the missing substitution parameter issue and be the value substituted, unless the session in Informatica was defined with a specific pathname for the substitution parameters.
4.) There should be no need to update the files at pathnames defined within Informatica for specific sessions, as these should be collected properly and made available to the bridge.


Connection types:
You can define target DB types for the a connection (that is not properly defined in Informatica) with a substitution parameter name like:
Connection.[name].DBTYPE

For example if a connection with the name 'ODBC_Connections' is assigned the Oracle database type at runtime, you can use the 'DBTYPE' name=value pair (in this case the bridge will know that 'ODBC_Connection' is of type Oracle and will use the proper handler to parse its metadata):
Connection.ODBC_Connection.DBTYPE=ORACLE

The database type is case insensitive. The list of possible values: Oracle,ODBC,Netezza,Microsoft SQL Server,DB2,Sybase,Informix,SAP BW, SAP R/3,Teradata.

Aliases:
When you know that two connections with different names target the same data in a database, one may use an 'ALIAS' to tell the bridge to treat them as the same data source in the lineage like this:
Connection.ODBC_Connection.ALIAS=oracleDB
Connection.Oracle_Connection.ALIAS=oracleDB
with these definitions, the names of the specified connections will be replaced with 'oracleDB' on runtime and linages will be computed accordingly.

Schemas:
You can override default schema for connection with 'SCHEMA':
Connection.DB_Conn.SCHEMA=dbo
empty schema for DB_Conn will be replaced with 'dbo' in this case.
DIRECTORY      
Cache control If true, the bridge will get changes from the last start. Otherwise all caches will be dropped and skipped. BOOLEAN
False
True
False  
Miscellaneous Specify miscellaneous options identified with a -letter and value.

For example, -l -c -m 4G -j -Dname=value -Xms1G

-l set INFA_HOME_PATH environment variable to the directory where PMREP is located.
-e allow any parameter file extensions. By default .TXT and .PRM file extensions for 'Parameter files directory' are supported.
-i remove illegal xml characters.

-v set environment variable(s) (e.g. -v var1=value -v var2='value with spaces').
-m the maximum Java memory size whole number (e.g. -m 4G or -m 2500M ).
-p the file path to xml or directory imported by the bridge from the Informatica tool. Used for debug purposes.
-f use this option if the bridge should import each folder separately.
-cp code page value to use (e.g. -cp UTF-8). Specify when the INFA_CODEPAGENAME environment variable is not defined. Please reference the Informatica documentation for details about supported code page values of the variable.

-cd: split or merge file system connections by a directory path.

For example, a connection can have two root folders, a and b. When you imported separate File stores for each root folder you would want to split the connection into two connections that can be resolved using these File stores. It can be achieved with option:
-cd a_con=orig_con:/a
requesting to create 'a_con' connection and move the 'a' folder to it from the 'orig_con' connection. The result will have a_con and orig_con connections. The orig_con connection will have the folder branch 'b' that is left over after splinting the folder branch 'a' out.

Here is a little bit more complex example:
-cd a_con=orig_con:/root/a - create 'a_con' connection for the 'root/a' folder branch in the 'a_con' connection.

You can use the option to merge several connections into one. For example, when you have two file stores C:\a and B:\b you can merge them with the option:
-cd C:\=B:\
that will move all folders from B:\ connection to C:\ that will end up with a and b root folders.

-pppd enables the DI/ETL post-processor processing of DI/ETL designs in order to create the design connections and connection data sets.
-j the last option that is followed by Java command line options (e.g. -j -Dname=value -Xms1G).
STRING      

 

Bridge Mapping

Meta Integration Repository (MIR)
Metamodel
(based on the OMG CWM standard)
"Informatica PowerCenter (Repository)"
Metamodel
InformaticaPowerCenterRepository
Mapping Comments
     
Association SourceField, TargetField Built from Foreign Key information
Aggregation KeyType, Nullable  
AssociationRole SourceField, TargetField  
Multiplicity   Based on Source information and KeyType attribute
Source   Based on Foreign Key information
AssociationRoleNameMap SourceField, TargetField  
Attribute SourceField, TargetField  
Description Description  
Name BusinessName Set to Name if BusinessName is not set
Optional Nullable  
PhysicalName Name Attribute Physical Name.Computed from 'name' if not set
Position FieldNumber  
BaseType SourceField, TargetField  
DataType Datatype See datatype conversion array
Name   Based on the datatype
PhysicalName   Derived from the datatype
CandidateKey SourceField, TargetField, TargetIndex  
UniqueKey KeyType, TargetIndex.Unique  
Class Source, Target  
CppClassType   Set to ENTITY
CppPersistent   Set to True
Description Description  
DimensionalRole dimensional Role metadata extension  
DimensionalType dimensional Role Type metadata extension  
Name BusinessName Set to Name if BusinessName is not set
PhysicalName Name Class Physical Name.Computed from the 'name' if not set
ClassifierMap Transformation, Instance  
Name Name  
DataAttribute TransformField  
Description Description  
Name Name  
DataSet Transformation, Instance  
Name Name  
DatabaseSchema Source, Target Tables and views belonging to the owner
Name DBD name or connection name DBD name is for sources only. Connection name if present.
DerivedType SourceField, TargetField  
DataType Datatype See datatype conversion array
Length Precision  
Name Datatype See datatype conversion array
PhysicalName   Derived from the datatype
Scale Scale  
UserDefined   Set to False
DesignPackage Repository, Folder, FolderVersion, Source, Target A SOURCE and a TARGET packages are created to hold the source sub-packages and the target Classes respectively. Packages are also created for Mappings and Workflows.
Description Description  
Name Name, Source.DBDName Name, Source.DBDName
UserDefined   Set to True
FeatureMap TransformField  
Name Name  
Operation Expression  
ForeignKey SourceField, TargetField  
Index TargetIndex  
Name Name  
IndexMember TargetIndexField  
Position   position in XML file
StoreModel PowerCenter file  
Name   Based on the file name