Scenario: Filtering rows and groups of rows - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a three-component Job. A tRowGenerator is used to create random entries which are directly sent to a tSampleRow where they will be filtered according to a defined range. In this scenario, we suppose the input flow contains names of salespersons along with their respective number of sold products and their years of presence in the enterprise. The result of the filtering operation is displayed on the Run console.

Dropping and linking the components

  1. Drop the following components from the Palette onto the design workspace: tRowGenerator, tSampleRow, and tLogRow.

  2. Connect the three components using Row > Main links.

Configuring the components

  1. In the design workspace, select tRowgenerator, and click the Component tab to define the basic settings for tRowGenerator.

  2. Click the [...] button next to Edit Schema to define the data you want to use as input. In this scenario, the schema is made of five columns.

  3. In the Basic settings view, click RowGenerator Editor to define the data to be generated.

  4. In the RowGenerator Editor, specify the number of rows to be generated in the Number of Rows for RowGenerator field and click OK. The RowGenerator Editor closes.

  5. In the design workspace, select tSampleRow and click the Component tab to define the basic settings for tSampleRow.

  6. In the Basic settings view, set the Schema to Built-In and click Sync columns to retrieve the schema from the tRowGenerator component.

  7. In the Range panel, set the filter to select your rows using the correct syntax as explained. In this scenario, we want to select the first and fifth lines along with the group of lines between 9 and 12.

  8. In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information about tLogRow, see tLogRow.

Saving and execting the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6, or click Run on the Run tab to execute the Job.

    The filtering result displayed on the console shows the first and fifth rows and the group of rows between 9 and 12.