tHMap - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Warning

This component is available in the Palette of the Studio only if you have subscribed to one of the Talend Platform products.

Function

tHMap transforms data from a wide range of sources to a wide range of destinations. If you want to use multiple inputs and/or outputs, you must use Talend Data Mapper I/O functions. For more information, see Talend Data Mapper User Guide.

Purpose

tHMap executes transformations (called maps) between different sources and destinations by harnessing the capabilities of Talend Data Mapper, available in the Mapping perspective.

tHMap properties

Component family

Processing

 

Basic settings

Open Map Editor

Click the [...] button to open the tHMap Structure Generate/Select wizard where you can either have the hierarchical mapper structure generated automatically based on the schema, or select an existing hierarchical mapper structure. You must do this for both the input and output sides of your Map.

 

Map Path

Specifies the map to be executed.

If the map was automatically created using the wizard described above, this is path is set automatically.

If you want to use an existing map, click the [...] button next to the Map Path field to open a dialog box in which you can select the map you want to use, then click the [...] button next to Open Map Editor to work with the map selected. Note that this map must have previously been created in the Mapping perspective.

Read Input As

Select the radio button which corresponds to how you want the input to be read. Depending on your map, only some of the options may be available.

  • Data Integration columns (default): Use this option if you are working with Talend Data Integration metadata.

  • Single column: Use this option if you are working with Talend Data Mapper metadata.

Write Output As

Select the radio button which corresponds to how you want the output to be written. Depending on your map, only some of the options may be available.

  • Data Integration columns (default): Use this option if you are working with Talend Data Integration metadata.

  • String (single column): Use this option if the data in the output column is to be a String.

  • Byte array (single column): Use this option if the data in the output column is to be a Byte array.

  • InputStream (single column): Use this option if you are working with Talend Data Mapper metadata and the input data is a stream.

  • Document (single column); Use this option if the output column is to be a Document.

Advanced settings

Map Variable

In this field, enter a context variable that you can use to define the relative path to a map file. For instance, if you enter ${context.mymapfile}, then mymapfile can point to different map files at runtime. This can be useful in cases where you want to use multiple maps without creating a new Job each time.

In the Contexts tab, the value must be an relative path. For instance, assuming you have a map called mapA in the folder Maps/FolderA, your context variable should contain the value "FolderA/mapA.xml". The .xml extension is needed because this is a reference to a file on the file system.

Note that all maps that might be referenced by the context variable must be present in the same Project. This way, when the job is built, it will contain all candidate maps and it will be possible to switch from one to another at runtime.

For further information on working with context variables, see Talend Studio User Guide.

 

Map each row (disable virtual component)

Select this check box to have tHMap process the input as a single output row. This prevents tHMap from buffering the input rows before delivering them downstream.

This can be useful, for example, when you use the tHMap component with tSAPIDocReceiver as the input component and any schema-aware component as the output component, because telling the tSAPIDocReceiver component to keep listening forever would otherwise lead to rows never being delivered.

Log Level

From the drop-down list, select how often you want events to be logged.

  • Infrequent: Logs only events related to startup, shutdown and exceptions.

  • Frequent (default): Logs events related to startup, shutdown and exceptions, and once per map execution.

  • Info: Logs all events at an informational level or higher.

  • All: Logs all events.

  • None: Logs nothing.

 

Exception Threshold

Talend Data Mapper returns an execution status with an severity value which can be OK, Info, Warning, Error or Fatal. By setting the exception threshold, you can specify the severity level at which an exception is thrown, thus enabling downstream components to detect the error in cases other than the default value of Fatal.

From the drop-down list, select the severity level at which an exception may be thrown during the execution of a map.

  • Fatal (default): An exception is thrown when a fatal error occurs.

  • Error: An exception is thrown when an error (or higher) occurs.

  • Warning: An exception is thrown when a warning (or higher) occurs.

Note that, in order to help you diagnose problems with your map, when you test the map in the Studio, any errors that occur which are at warning level or above will be printed in the console window, regardless of the setting of the Exception Threshold.

 tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

EXECUTION_STATUS: the pointer to the ExecutionStatus object, which is returned whenever tHMap executes a Talend Data Mapper map. This is an After variable and it returns a string.

EXECUTION_SEVERITY: the Overall Severity numeric value. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

tHMap is used for Jobs that require complex data mapping from a variety of different sources.

The input and output connections can use Talend Data Mapper metadata, Talend Data Integration metadata, or a combination of the two. Each connection is independent.

When you open the Map Editor for the first time for each connection, it either generates a Talend Data Mapper structure definition based on the schema of the Talend Data Integration component, or allows you to select an existing Talend Data Mapper structure if you are using Talend Data Mapper metadata. It then creates a map with the structure selected or generated.

When running Jobs which include a tHMap component in the Runtime, ensure that Talend Data Mapper has been correctly deployed to the Runtime.

This component can be used in several ways:

Limitation

n/a

Note

For further information about performing transformations using Talend Data Mapper, see Talend Data Mapper User Guide.

Scenario 1: Using Talend Data Mapper metadata

The following scenario creates a three-component Job, reading data from an input file that is transformed using a map that was previously created in the Mapping perspective and then outputting the transformed data in a new file. It works with Talend Data Mapper metadata.

Copying an editable version of the example files

  1. In the Mapping perspective, in the Data Mapper view, expand the Hierarchical Mapper node and the Other Projects folder, right-click Examples and then select Copy in the contextual menu.

  2. In the Data Mapper view, right-click at the root of the Hierarchical Mapper node, and then select Paste in the contextual menu.

    This copies an editable version of all the read-only example files to your local workspace.

Adding and linking the components

  1. In the Integration perspective, create a new Job and call it tdm_to_tdm.

  2. Click the point in the design workspace where you want to add the first component, start typing tFileInputRaw, and then click the name of the component when it appears in the list proposed in order to select it.

  3. Do the same to add a tHMap component and a tFileOutputRaw component as well.

  4. Connect the tFileInputRaw component to the tHMap component using a Row > Main link and rename it input, then connect the tHMap component to the tFileOutputRaw component using a Row > Main link and name it output. When you are asked if you want to get the schema of the target component, click Yes.

Defining the properties of tFileInputRaw

  1. Select the tFileInputRaw component to define its properties.

  2. In the Basic settings tab, click the [...] button next to the Filename field then browse to the location on your file system where the input file is stored, or enter the path manually between double quotes. For this example, use <PATH_TO_WORKSPACE>/<PROJECT_NAME>/Sample Data/CSV/PurchaseOrderPayPal/PayPalPO.csv.

  3. Set the Mode as Read the file as a string, and leave all the other parameters unchanged.

Defining the properties of tFileOutputRaw

  1. Select the tFileOutputRaw component to define its properties.

  2. In the Basic settings tab, click the [...] button then browse to the location on your file system where the output file is to be stored, or enter the path manually between double quotes. Leave the other parameters unchanged.

Defining the properties of tHMap

  1. Select the tHMap component to define its properties.

  2. Click the [...] button next to the Map Path field to open the picker and select the map to use, Maps/CSV/POPayPalCsv_PO2, then click OK. This map transforms a CSV file into an XML file.

  3. Check that Read Input As is set to Single Column.

  4. Check that Write Output As is set to String (single column).

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. In the Run tab, click Run to execute the Job.

  3. Browse to the location on your file system where the output file is stored to check that an XML file has been created containing the same data as the input CSV file.

Scenario 2: Using Talend Data Integration metadata

The following scenario creates a three-component Job, reading data from an input file that is transformed using a map that you create in the Mapping perspective and then outputting the transformed data in a new file. It works with Talend Data Integration metadata.

Copying an editable version of the example files

  1. In the Mapping perspective, in the Data Mapper view, expand the Hierarchical Mapper node and the Other Projects folder, right-click Examples and then select Copy in the contextual menu.

  2. In the Data Mapper view, right-click at the root of the Hierarchical Mapper node, and then select Paste in the contextual menu.

    This copies an editable version of all the read-only example files to your local workspace.

Adding and linking the components

  1. In the Integration perspective, create a new Standard Job and call it di_to_di.

  2. Click the point in the design workspace where you want to add the first component, start typing tFileInputDelimited, and then click the name of the component when it appears in the list proposed in order to select it.

  3. Do the same to add a tHMap component and a tFileOutputXML component as well.

  4. Connect the tFileInputDelimited component to the tHMap component using a Row > Main link, then connect the tHMap component to the tFileOutputXML component using a Row > Main link.

Defining the properties of tFileInputDelimited

  1. Select the tFileInputDelimited component to define its properties.

  2. In the Basic settings tab, click the [...] button next to the File name/Stream field then browse to the location on your file system where the input Excel file is stored, or enter the path manually between double quotes. For this example, use <PATH_TO_WORKSPACE>/<PROJECT_NAME>/Sample Data/CSV/PurchaseOrderPayPal/PayPalPO.csv.

  3. Select the CSV options check box.

  4. Change the Field Separator to a comma, between double quotes (",").

  5. Change the value of Header to 1.

  6. Click the [...] button next to Edit schema to define the schema.

  7. Add three columns and rename them txn_id, payment_date and first_name (which correspond to the names of the first three columns in the input file, and is sufficient for the purposes of this example), and then click OK.

  8. Leave all the other parameters unchanged.

Defining the properties of tFileOutputXML

  1. Select the tFileOutputXML component to define its properties.

  2. In the Basic settings tab, click the [...] button next to the File Name field then browse to the location on your file system where the output file will be stored, or enter the path manually between double quotes.

  3. Click the [...] button next to Edit schema to define the schema.

  4. Add three columns to the input schema on the left and rename them id, date and name, copy them to the output schema on the right, and then click OK.

  5. Leave the other elements unchanged.

Defining the properties of tHMap

  1. Select the tHMap component to define its properties.