Scenario 2: Mapping data using inner join rejections - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario, based on scenario 1, adds one input file containing details about resellers and extra fields in the main output table. Two filters on inner joins are added to gather specific rejections.

Linking the components

  1. Drop a tFileInputDelimited component and a tFileOutputDelimited component to the design workspace, and label the components as Resellers and No_Reseller_ID respectively.

  2. Connect it to the Mapper using a Row > Main connection, and label the connection as Resellers_data.

  3. Connect the tMap component to the new tFileOutputDelimited component by using the Row connection named Reject_ResellerID.

Configuring the components

  1. Double-click the Resellers component to display its Basic settings view.

  2. Select Repository from the Property type list and select the component's schema, resellers in this scenario, from the [Repository Content] dialog box. The rest fields are automatically filled.

    Note

    In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information regarding metadata creation in the Repository, see Talend Studio User Guide.

  3. Double-click the tMap component to open the Map Editor.

    Note that the schema of the new input component is already added in the Input area.

  4. Create a join between the main input flow and the new input flow by dropping the ID_Reseller column of the Cars_data table to the ID_Reseller column of the Resellers_data table.

  5. Click the tMap settings button at the top of the Resellers_data table and set Join Model to Inner Join.

  6. Drag all the columns except ID_Reseller of the Resellers_data table to the main output table, Insured.

    Note

    When two inner joins are defined, you either need to define two different inner join reject tables to differentiate the two rejections or, if there is only one inner join reject output, both inner join rejections will be stored in the same output.

  7. Click the [+] button at the top of the output area to add a new output table, and name this new output table Reject_ResellerID.

  8. Drag all the columns of the Cars_data table to the Reject_ResellerID table.

  9. Click the tMap settings button and select Catch lookup inner join reject to true to define this new output table as an inner join reject output.

    If the defined inner join cannot be established, the information about the relevant cars will be gathered through this output flow.

  10. Now apply filters on the two Inner Join reject outputs, in order for to distinguish the two types of rejection.

    In the first Inner Join output table, Reject_OwnerID, click the plus arrow button to add a filter line and fill it with the following formula to gather only owner ID related rejection: Owners_data.ID_Owner==null

  11. In the second Inner Join output table, Reject_ResellerID, repeat the same operation using the following formula: Resellers_data.ID_Reseller==null

    Click OK to validate the map settings and close the Mapper Editor.

  12. Double-click the No_Reseller_ID component to display its Basic settings view.

    Specify the output file path and select the Include Header check box, and leave the other parameters as they are.

  13. To demonstrate the work of the Mapper, in this example, remove reseller IDs 5 and 8 from the input file Resellers.csv.

Executing the Job

  1. Press Ctrl + S to save your Job.

  2. Press F6 to run the Job.

    The four output files are all created in the specified folder, containing information as defined. The output file No_Reseller_ID.csv contains the cars information related to reseller IDs 5 and 8, which are missing in the input file Resellers.csv.