Regrouping sorted rows - 7.2

Processing (Integration)

Version
7.2
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Processing components (Integration)
Data Quality and Preparation > Third-party systems > Processing components (Integration)
Design and Development > Third-party systems > Processing components (Integration)

This Java scenario describes a four-component Job. It aims at reading a given delimited file row by row, sorting input data by sort type and order, denormalizing all input sorted rows and displaying the output on the Run log console.

For more technologies supported by Talend, see Talend components.

  • Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tSortRow, tDenormalizeSortedRow, and tLogRow.

  • Connect the four components using Row Main links.

  • In the design workspace, select tFileInputDelimited.

  • Click the Component tab to define the basic settings for tFileInputDelimited.

  • Set Property Type to Built-In.

  • Fill in a path to the processed file in the File Name field. The name_list file used in this example holds two columns, id and first name.

  • If needed, define row and field separators, header and footer, and the number of processed rows.

  • Set Schema to Built in and click the three-dot button next to Edit Schema to define the data to pass on to the next component. The schema in this example consists of two columns, id and name.

  • In the design workspace, select tSortRow.

  • Click the Component tab to define the basic settings for tSortRow.

  • Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tFileInputDelimited component.

  • In the Criteria panel, use the plus button to add a line and set the sorting parameters for the schema column to be processed. In this example we want to sort the id columns in ascending order.

  • In the design workspace, select tDenormalizeSortedRow.

  • Click the Component tab to define the basic settings for tDenormalizeSortedRow.

  • Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tSortRow component.

  • In the Input rows countfield, enter the number of the input rows to be processed or press Ctrl+Space to access the context variable list and select the variable: tFileInputDelimited_1_NB_LINE.

  • In the To denormalize panel, use the plus button to add a line and set the parameters to the column to be denormalize. In this example we want to denormalize the name column.

  • In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information about tLogRow, see tLogRow.

  • Save your Job and press F6 to execute it.

The result displayed on the console shows how the name column was denormalize.