tMap - 6.3

Talend Open Studio Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tMap is an advanced component, which integrates itself as plugin to Talend Studio.

Purpose

tMap transforms and routes data from single or multiple sources to single or multiple destinations.

tMap properties

Component family

Processing

 

Basic settings

Map editor

It allows you to define the tMap routing and transformation properties.

If needed, click the button at the top of the input area to open the [Property Settings] dialog box, which provides the following options:

  • Die on error: Select this check box if you want to kill the Job if there is an error. This check box is selected by default.

  • Enable Auto-Conversion of types: If your input and output columns across a mapping are of different data types, select this check box to enable automatic type conversion at the run time to avoid compiling errors.

    This option is enabled by default if the Enable Auto-Conversion of types check box is selected in the Project Settings view when this component is added. You can also override the default conversion behavior of this component by setting conversion rules in the Project Settings view. For more information, see Talend Studio User Guide.

    Note that auto conversion between Date and BigDecimal is not supported.

  • Store on disk: The options provided in this area are identical to the relevant options provided on the Basic settings and Advanced settings tabs respectively. Settings made in the [Property Settings] dialog box are reflected in the respective tab views and vice versa.

 

Mapping links display as

Auto: the default setting is curves links

Curves: the mapping display as curves

Lines: the mapping displays as straight lines. This last option allows to slightly enhance performance.

  Temp data directory path Enter the path where you want to store the temporary data generated for lookup loading. For more information on this folder, see Talend Studio User Guide.

 

Preview

The preview is an instant shot of the Mapper data. It becomes available when Mapper properties have been filled in with data. The preview synchronization takes effect only after saving changes.

Advanced settings Max buffer size (nb of rows) Type in the size of physical memory, in number of rows, you want to allocate to processed data.
  Ignore trailing zeros for BigDecimal Select this check box to ignore trailing zeros for BigDecimal data.
 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Possible uses are from a simple reorganization of fields to the most complex Jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and more...

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

The use of tMap supposes minimum Java knowledge in order to fully exploit its functionalities.

This component is a junction step, and for this reason cannot be a start nor end component in the Job.

Scenario 1: Mapping data using a filter and a simple explicit join

The Job described below aims at reading data from a csv file with its schema stored in the Repository, looking up at a reference file, the schema of which is also stored in the Repository, then extracting data from these two files based on a defined filter to an output file and reject files.

Linking the components

  1. Drop two tFileInputDelimited components, tMap and three tFileOutputDelimited components onto the design workspace.

  2. Rename the two tFileInputDelimited components as Cars and Owners, either by double-clicking the label in the design workspace or via the View tab of the Component view.

  3. Connect the two input components to tMap using Row > Main connections and label the connections as Cars_data and Owners_data respectively.

  4. Connect tMap to the three output components using Row > New Output (Main) connections and name the output connections as Insured, Reject_NoInsur and Reject_OwnerID respectively.

Configuring the components

  1. Double-click the tFileInputDelimited component labelled Cars to display its Basic settings view.

  2. Select Repository from the Property type list and select the component's schema, cars in this scenario, from the [Repository Content] dialog box. The rest fields are automatically filled.

  3. Double-click the component labelled Owners and repeat the setting operation. Select the appropriate metadata entry, owners in this scenario.

    Note

    In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information regarding metadata creation in the Repository, see Talend Studio User Guide.

  4. Double-click the tMap component to open the Map Editor.

    Note that the input area is already filled with the defined input tables and that the top table is the main input table, and the respective row connection labels are displayed on the top bar of the table.

  5. Create a join between the two tables on the ID_Owner column by simply dropping the ID_Owner column from the Cars_data table onto the ID_Owner column in the Owners_data table.

  6. Define this join as an inner join by clicking the tMap settings button, clicking in the Value field for Join Model, clicking the small button that appears in the field, and selecting Inner Join from the [Options] dialog box.

  7. Drag all the columns of the Cars_data table to the Insured table.

  8. Drag the ID_Owner, Registration, and ID_Reseller columns of the Cars_data table and the Name column of the Owners_data table to the Reject_NoInsur table.

  9. Drag all the columns of the Cars_data table to the Reject_OwnerID table.

    For more information regarding data mapping, see Talend Studio User Guide.

  10. Click the plus arrow button at the top of the Insured table to add a filter row.

    Drag the ID_Insurance column of the Owners_data table to the filter condition area and enter the formula meaning 'not undefined': Owners_data.ID_Insurance != null.

    With this filter, the Insured table will gather all the records that include an insurance ID.

  11. Click the tMap settings button at the top of the Reject_NoInsur table and set Catch output reject to true to define the table as a standard reject output flow to gather the records that do not include an insurance ID.

  12. Click the tMap settings button at the top of the Reject_OwnerID table and set Catch lookup inner join reject to true so that this output table will gather the records from the Cars_data flow with missing or unmatched owner IDs.

    Click OK to validate the mappings and close the Map Editor.

  13. Double-click each of the output components, one after the other, to define their properties. If you want a new file to be created, browse to the destination output folder, and type in a file name including the extension.

    Select the Include header check box to reuse the column labels from the schema as header row in the output file.

Executing the Job

  1. Press Ctrl + S to save your Job.

  2. Press F6 to run the Job.

    The output files are created, which contain the relevant data as defined.

Scenario 2: Mapping data using inner join rejections

This scenario, based on scenario 1, adds one input file containing details about resellers and extra fields in the main output table. Two filters on inner joins are added to gather specific rejections.

Linking the components

  1. Drop a tFileInputDelimited component and a tFileOutputDelimited component to the design workspace, and label the components as Resellers and No_Reseller_ID respectively.

  2. Connect it to the Mapper using a Row > Main connection, and label the connection as Resellers_data.

  3. Connect the tMap component to the new tFileOutputDelimited component by using the Row connection named Reject_ResellerID.

Configuring the components

  1. Double-click the Resellers component to display its Basic settings view.

  2. Select Repository from the Property type list and select the component's schema, resellers in this scenario, from the [Repository Content] dialog box. The rest fields are automatically filled.

    Note

    In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information regarding metadata creation in the Repository, see Talend Studio User Guide.

  3. Double-click the tMap component to open the Map Editor.

    Note that the schema of the new input component is already added in the Input area.

  4. Create a join between the main input flow and the new input flow by dropping the ID_Reseller column of the Cars_data table to the ID_Reseller column of the Resellers_data table.

  5. Click the tMap settings button at the top of the Resellers_data table and set Join Model to Inner Join.

  6. Drag all the columns except ID_Reseller of the Resellers_data table to the main output table, Insured.