Scenario 1: Mapping data using a filter and a simple explicit join - 6.1

Talend Components Reference Guide

Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Talend Studio
Data Governance
Data Quality and Preparation
Design and Development

The Job described below aims at reading data from a csv file with its schema stored in the Repository, looking up at a reference file, the schema of which is also stored in the Repository, then extracting data from these two files based on a defined filter to an output file and reject files.

Linking the components

  1. Drop two tFileInputDelimited components, tMap and three tFileOutputDelimited components onto the design workspace.

  2. Rename the two tFileInputDelimited components as Cars and Owners, either by double-clicking the label in the design workspace or via the View tab of the Component view.

  3. Connect the two input components to tMap using Row > Main connections and label the connections as Cars_data and Owners_data respectively.

  4. Connect tMap to the three output components using Row > New Output (Main) connections and name the output connections as Insured, Reject_NoInsur and Reject_OwnerID respectively.

Configuring the components

  1. Double-click the tFileInputDelimited component labelled Cars to display its Basic settings view.

  2. Select Repository from the Property type list and select the component's schema, cars in this scenario, from the [Repository Content] dialog box. The rest fields are automatically filled.

  3. Double-click the component labelled Owners and repeat the setting operation. Select the appropriate metadata entry, owners in this scenario.


    In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information regarding metadata creation in the Repository, see Talend Studio User Guide.

  4. Double-click the tMap component to open the Map Editor.

    Note that the input area is already filled with the defined input tables and that the top table is the main input table, and the respective row connection labels are displayed on the top bar of the table.

  5. Create a join between the two tables on the ID_Owner column by simply dropping the ID_Owner column from the Cars_data table onto the ID_Owner column in the Owners_data table.

  6. Define this join as an inner join by clicking the tMap settings button, clicking in the Value field for Join Model, clicking the small button that appears in the field, and selecting Inner Join from the [Options] dialog box.

  7. Drag all the columns of the Cars_data table to the Insured table.

  8. Drag the ID_Owner, Registration, and ID_Reseller columns of the Cars_data table and the Name column of the Owners_data table to the Reject_NoInsur table.

  9. Drag all the columns of the Cars_data table to the Reject_OwnerID table.

    For more information regarding data mapping, see Talend Studio User Guide.

  10. Click the plus arrow button at the top of the Insured table to add a filter row.

    Drag the ID_Insurance column of the Owners_data table to the filter condition area and enter the formula meaning 'not undefined': Owners_data.ID_Insurance != null.

    With this filter, the Insured table will gather all the records that include an insurance ID.

  11. Click the tMap settings button at the top of the Reject_NoInsur table and set Catch output reject to true to define the table as a standard reject output flow to gather the records that do not include an insurance ID.

  12. Click the tMap settings button at the top of the Reject_OwnerID table and set Catch lookup inner join reject to true so that this output table will gather the records from the Cars_data flow with missing or unmatched owner IDs.

    Click OK to validate the mappings and close the Map Editor.

  13. Double-click each of the output components, one after the other, to define their properties. If you want a new file to be created, browse to the destination output folder, and type in a file name including the extension.

    Select the Include header check box to reuse the column labels from the schema as header row in the output file.

Executing the Job

  1. Press Ctrl + S to save your Job.

  2. Press F6 to run the Job.

    The output files are created, which contain the relevant data as defined.

    For examples of how to use dynamic schemas with tMap, see: