Scenario 1: Sorting entries - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a three-component Job. A tRowGenerator is used to create random entries which are directly sent to a tSortRow to be ordered following a defined value entry. In this scenario, we suppose the input flow contains names of salespersons along with their respective sales and their years of presence in the company. The result of the sorting operation is displayed on the Run console.

  • Drop the three components required for this use case: tRowGenerator, tSortRow and tLogRow from the Palette to the design workspace.

  • Connect them together using Row main links.

  • On the tRowGenerator editor, define the values to be randomly used in the Sort component. For more information regarding the use of this particular component, see tRowGenerator

  • In this scenario, we want to rank each salesperson according to its Sales value and to its number of years in the company.

  • Double-click tSortRow to display the Basic settings tab panel. Set the sort priority on the Sales value and as secondary criteria, set the number of years in the company.

  • Use the plus button to add the number of rows required. Set the type of sorting, in this case, both criteria being integer, the sort is numerical. At last, given that the output wanted is a rank classification, set the order as descending.

  • Display the Advanced Settings tab and select the Sort on disk check box to modify the temporary memory parameters. In the Temp data directory path field, type the path to the directory where you want to store the temporary data. In the Buffer size of external sort field, set the maximum buffer value you want to allocate to the processing.

Warning

The default buffer value is 1000000 but the more rows and/or columns you process, the higher the value needs to be to prevent the Job from automatically stopping. In that event, an "out of memory" error message displays.

  • Make sure you connected this flow to the output component, tLogRow, to display the result in the Job console.

  • Press F6 to run the Job. The ranking is based first on the Sales value and then on the number of years of experience.