Scenario 4: Advanced mapping using filters, explicit joins and rejections - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario introduces a Job that allows you to find BMW owners who have two to six children (inclusive), for sales promotion purpose for example.

Linking the components

  1. Drop three tFileInputDelimited components, a tMap component, and two tFileOutputDelimited components from the Palette onto the design workspace, and label them to best describe their functions.

  2. Connect the input components to the tMap using Row > Main connections.

    Pay attention to the file you connect first as it will automatically be set as Main flow, and all the other connections will be Lookup flows. In this example, the connection for the input component Owners is the Main flow.

Configuring the components

  1. Define the properties of each input components in the respective Basic settings view. Define the properties of Owners.

  2. Select Repository from the Property type list and select the component's schema, owners in this scenario, from the [Repository Content] dialog box. The rest fields are automatically filled.

    Note

    In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information regarding metadata creation in the Repository, see Talend Studio User Guide.

    In the same way, set the properties of the other input components: Cars and Resellers. These two Lookup flows will fill in secondary (lookup) tables in the input area of the Map Editor.

  3. Then double-click the tMap component to launch the Map Editor and define the mappings and filters.

    Set an explicit join between the Main flow Owner and the Lookup flow Cars by dropping the ID_Owner column of the Owners table to the ID_Owner column of the Cars table.

    The explicit join is displayed along with a hash key.

  4. In the Expr. Key field of the Make column, type in a filter. In this use case, simply type in "BMW" as the search is focused on the owners of this particular make.

  5. Implement a cascading join between the two lookup tables Cars and Resellers on the ID_Reseller column in order to retrieve resellers information.

  6. As you want to reject the null values into a separate table and exclude them from the standard output, click the tMap settings button and set Join Model to Inner Join in each of the Lookup tables.

  7. In the tMap settings, you can set Match Model to Unique match, First match, or All matches. In this use case, the All matches option is selected. Thus if several matches are found in the Inner Join, rows matching the explicit join as well as the filter, all of them will be added to the output flow (either in rejection or the regular output).

    Note

    The Unique match option functions as a Last match. The First match and All matches options function as named.

  8. On the output area of the Map Editor, click the plus button to add two tables, one for the full matches and the other for the rejections.

  9. Drag all the columns of the Owners table, the Registration, Make and Color columns of the Cars table, and the ID_Reseller and Name_Reseller columns of the Resellers table to the main output table.

  10. Drag all the columns of the Owners table to the reject output table.

  11. Click the Filter button at the top of the main output table to display the Filter expression area.

    Type in a filter statement to narrow down the number of rows loaded in the main output flow. In this use case, the statement reads: Owners.Children_Nr >= 2 && Owners.Children_Nr <= 6.

  12. In the reject output table, click the tMap settings button and set the reject types.

    Set Catch output reject to true to collect data about BMW car owners who have less than two or more than six children.

    Set Catch lookup inner join reject to true to collect data about owners of other car makes and owners for whom the reseller information is not found.

    Click OK to validate the mappings and close the Map Editor.

    On the design workspace, right-click the tMap and pull the respective output link to the relevant output components.

  13. Define the properties of the output components in their respective Basic settings view.

    In this use case, simple specify the output file paths and select the Include Header check box, and leave the other parameters as they are.

Executing the Job

  1. Press Ctrl + S to save your Job.

  2. Press F6 to run it.

    The main output file contains the information related to BMW owners who have two to six children, and the reject output file contains the information about the rest of the car owners.