Scenario: Regrouping sorted rows - 6.1

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This Java scenario describes a four-component Job. It aims at reading a given delimited file row by row, sorting input data by sort type and order, denormalizing all input sorted rows and displaying the output on the Run log console.

  • Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tSortRow, tDenormalizeSortedRow, and tLogRow.

  • Connect the four components using Row Main links.

  • In the design workspace, select tFileInputDelimited.

  • Click the Component tab to define the basic settings for tFileInputDelimited.

  • Set Property Type to Built-In.

  • Fill in a path to the processed file in the File Name field. The name_list file used in this example holds two columns, id and first name.

  • If needed, define row and field separators, header and footer, and the number of processed rows.

  • Set Schema to Built in and click the three-dot button next to Edit Schema to define the data to pass on to the next component. The schema in this example consists of two columns, id and name.

  • In the design workspace, select tSortRow.

  • Click the Component tab to define the basic settings for tSortRow.

  • Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tFileInputDelimited component.

  • In the Criteria panel, use the plus button to add a line and set the sorting parameters for the schema column to be processed. In this example we want to sort the id columns in ascending order.

  • In the design workspace, select tDenormalizeSortedRow.

  • Click the Component tab to define the basic settings for tDenormalizeSortedRow.

  • Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tSortRow component.

  • In the Input rows countfield, enter the number of the input rows to be processed or press Ctrl+Space to access the context variable list and select the variable: tFileInputDelimited_1_NB_LINE.

  • In the To denormalize panel, use the plus button to add a line and set the parameters to the column to be denormalize. In this example we want to denormalize the name column.

  • In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information about tLogRow, see tLogRow.

  • Save your Job and press F6 to execute it.

The result displayed on the console shows how the name column was denormalize.