Scenario: Replicating a flow and sorting two identical flows respectively - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The scenario describes a Job that reads an input flow which contains names and states from a CSV file, replicates the input flow, then sorts the two identical flows based on name and state respectively, and displays the sorted data on the console.

Setting up the Job

  1. Drop the following components from the Palette to the design workspace: one tFileInputDelimited component, one tReplicate component, two tSortRow components, and two tLogRow components.

  2. Connect tFileInputDelimited to tReplicate using a Row > Main link.

  3. Repeat the step above to connect tReplicate to two tSortRow components respectively and connect tSortRow to tLogRow.

  4. Label the components to better identify their functions.

Configuring the components

  1. Double-click the tFileInputDelimited component to open its Basic settings view in the Component tab.

  2. Click the [...] button next to the File name/Stream field to browse to the file from which you want to read the input flow. In this example, the input file is Names&States.csv, which contains two columns: name and state.

    name;state
    Andrew Kennedy;Mississippi
    Benjamin Carter;Louisiana
    Benjamin Monroe;West Virginia
    Bill Harrison;Tennessee
    Calvin Grant;Virginia
    Chester Harrison;Rhode Island
    Chester Hoover;Kansas
    Chester Kennedy;Maryland
    Chester Polk;Indiana
    Dwight Nixon;Nevada
    Dwight Roosevelt;Mississippi
    Franklin Grant;Nebraska
  3. Fill in the Header, Footer and Limit fields according to your needs. In this example, type in 1 in the Header field to skip the first row of the input file.

  4. Click Edit schema to define the data structure of the input flow.

  5. Double-click the first tSortRow component to open its Basic settings view.

  6. In the Criteria panel, click the [+] button to add one row and set the sorting parameters for the schema column to be processed. To sort the input data by name, select name under Schema column. Select alpha as the sorting type and asc as the sorting order.

    For more information about those parameters, see tSortRow properties.

  7. Double-click the second tSortRow component and repeat the step above to define the sorting parameters for the state column.

  8. In the Basic settings view of each tLogRow component, select Table in the Mode area for a better view of the Job execution result.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or clicking Run on the Run tab.

    The data sorted by name and state are both displayed on the console.