Mapping data using a filter and a simple explicit join

Talend Data Fabric Getting Started Guide

EnrichVersion
6.1
EnrichProdName
Talend Data Fabric
task
Design and Development
Data Quality and Preparation
Data Governance
EnrichPlatform
Talend Studio

The Job described below aims at reading data from a csv file, looking up at a reference file, and then extracting data from these two files based on a defined filter to an output file and reject files.

Adding and linking the components

  1. Add two tFileInputDelimited components, a tMap and three tFileOutputDelimited components onto the design workspace.

  2. Rename the two tFileInputDelimited components as Cars and Owners, either by double-clicking the label in the design workspace or via the View tab of the Component view.

  3. Connect the two input components to tMap using Row > Main connections and label the connections as Cars_data and Owners_data respectively.

  4. Connect tMap to the three output components using Row > New Output (Main) connections and name the output connections as Insured, Reject_NoInsur and Reject_OwnerID respectively.

Configuring the input components

  1. Double-click the tFileInputDelimited component labelled Cars to display its Basic settings view.

  2. Select Repository from the Property type list and select the component's schema, cars in this scenario, from the [Repository Content] dialog box. The rest fields are automatically filled.

    Note

    This scenario assumes that the metadata of the input files is stored in the Metadata node of the Repository tree view for easy retrieval. For further information regarding metadata creation in the Repository, see Talend Studio User Guide.

    If you do not have the metadata of your input files centralized in the Repository, you need to set the property type to Built-In and specify file path and define the file schema manually. Below is an abstract of the input file cars.scv:

    ID_Owner;Registration;Make;Color;ID_Reseller
    1;WZG 555;Ford;red;22
    2;HYZ 472;Lexus;red;39
    3;VYZ 862;Lexus;blue;21
    4;ZYZ 350;Audi;red;31
    5;EDZ 99;Audi;green;62
    6;ZZX 845;Citroen;black;75
    7;PBS 410;Renault;grey;11
    8;JFO 929;Citroen;white;86
    9;DPG 217;Lexus;black;13
  3. Double-click the component labelled Owners and repeat the setting operation. Select the appropriate metadata entry, owners in this scenario.

    If you do not have the metadata of your input files centralized in the Repository, you need to set the property type to Built-In and specify file path and define the file schema manually. Below is an abstract of the reference input file owners.csv:

    ID_Owner;Name;ID_Insurance;Chlidren_Nr
    1;George EISENHOWER;108;8
    2;James LINCOLN;35;8
    3;William TAFT;6;10
    4;Harry WILSON;134;3
    5;Woodrow HOOVER;45;8
    6;Chester TAYLOR;148;2
    7;John REAGAN;31;8
    8;Dwight POLK;105;7
    9;William PIERCE;177;2
    

Configuring the mapping component

  1. Double-click the tMap component to open the Map Editor.

    Note that the input area is already filled with the defined input tables and that the top table is the main input table, and the respective row connection labels are displayed on the top bar of the table.

  2. Create a join between the two tables on the ID_Owner column by simply dropping the ID_Owner column from the Cars_data table onto the ID_Owner column in the Owners_data table.

  3. Define this join as an inner join by clicking the tMap settings button, clicking in the Value field for Join Model, clicking the small button that appears in the field, and selecting Inner Join from the [Options] dialog box.

  4. Drag all the columns of the Cars_data table to the Insured table.

  5. Drag the ID_Owner, Registration, and ID_Reseller columns of the Cars_data table and the Name column of the Owners_data table to the Reject_NoInsur table.

  6. Drag all the columns of the Cars_data table to the Reject_OwnerID table.

    For more information regarding data mapping, see Talend Studio User Guide.

  7. Click the plus arrow button at the top of the Insured table to add a filter row.

    Drag the ID_Insurance column of the Owners_data table to the filter condition area and enter the formula meaning 'not undefined': Owners_data.ID_Insurance != null.

    With this filter, the Insured table will gather all the records that include an insurance ID.

  8. Click the tMap settings button at the top of the Reject_NoInsur table and set Catch output reject to true to define the table as a standard reject output flow to gather the records that do not include an insurance ID.

  9. Click the tMap settings button at the top of the Reject_OwnerID table and set Catch lookup inner join reject to true so that this output table will gather the records from the Cars_data flow with missing or unmatched owner IDs.

    Click OK to validate the mappings and close the Map Editor.

Configuring the output components

  1. Double-click each of the output components, one after the other, to define their properties. If you want a new file to be created, browse to the destination output folder, and type in a file name including the extension.

  2. Select the Include header check box to reuse the column labels from the schema as header row in the output file.

Executing the Job

  1. Press Ctrl + S to save your Job.

  2. Press F6 to run the Job.

    The output files are created, which contain the relevant data as defined.