Configuring the components - 7.3

Processing (Integration)

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Processing components (Integration)
Data Quality and Preparation > Third-party systems > Processing components (Integration)
Design and Development > Third-party systems > Processing components (Integration)
Last publication date
2024-02-21

Procedure

  1. Double-click the tFileInputDelimited component labelled Main_Input to display its Basic settings view.
    Warning:

    The dynamic schema feature is only supported in Built-In mode and requires the input file to have a header row.

  2. Click the [...] button next to the File Name/Stream field to browse to your main input file, and type in 1 in the Header field to define the first row as the header row.
    In this use case, the main input file contains the following information:
    FirstName;LastName;HouseNo;Street;City
    Gerald;Roosevelt;48;Fairview Avenue;Oklahoma City
    Benjamin;Harrison;27;Katella Avenue;Little Rock
    Bob;Clinton;11;Bowles Avenue;Raleigh
    James;Quincy;45;Cerrillos Road;Saint Paul
    Gerald;Harrison;27;Katella Avenue;Little Rock
    Harry;Madison;85;Santa Monica Road;Raleigh
    Helen;Roosevelt;48;Fairview Avenue;Oklahoma City
    Mary;Clinton;11;Bowles Avenue;Raleigh
    Cathey;Quincy;45;Cerrillos Road;Saint Paul
    John;Smith;64;Market Street;Helena
  3. Click Edit schema to define the schema for this component.
    In this use case, the main input file has five columns: FirstName, LastName, HouseNo, Street, and City. However, as we can leverage the advantage of the dynamic schema feature, we simply define two columns: one string type of column for the first names of people, and one dynamic column for the family information. To do so:
    1. Click the [+] button to add two columns, and name them FirstName and FamilyInfo respectively.
    2. Select String from the Type list for the FirstName column to retrieve the first name of each person on the name list.
    3. Select Dynamic from the Type list for the FamilyInfo column to retrieve the rest information of each person on the name list: the last name, house number, street, and city, which all together will identify a family.
    4. Click OK to propagate the schema and close the Schema dialog box.
  4. Following steps similar to the above, define the properties for the tFileInputDelimited component labelled Ref_Input: the path to the reference input file, the header row, and the schema. This time, just define one dynamic column, FamilyInfo, to retrieve the four columns of the reference input file, which contains the following information:
    LastName;HouseNo;Street;City
    Clinton;11;Bowles Avenue;Raleigh
    Quincy;45;Cerrillos Road;Saint Paul
    Smith;64;Market Street;Helena
  5. Double-click the tJoin component to open its Basic settings view.
  6. Click Edit schema to open the Schema dialog box to check the data structures of the input files and define the data you want to pass to the output components.
    In this scenario, we want to pass both columns of the main input file, FirstName and FamilyInfo, to the output files, so simply copy the schema columns of the main input file by clicking the ->> button. Then, click OK to validate the schema and close the dialog box.
  7. In the Key definition area, click the [+] button to add one column to the list and then select the input column you want to match from the Input key attribute list and the reference column against which you want match the input column from Lookup key attribute list, FamilyInfo and row2.FamilyInfo respectively in this example.
  8. Make sure that the Inner join (with reject output) check box is selected to define one of the outputs as inner join reject table.
  9. In the Basic settings view of each tLogRow component, select the Table option to display the output information in table cells.