Sorting entries - Cloud - 8.0

Processing (Integration)

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Processing components (Integration)
Data Quality and Preparation > Third-party systems > Processing components (Integration)
Design and Development > Third-party systems > Processing components (Integration)
Last publication date
2024-03-05

This scenario describes a three-component Job. A tRowGenerator is used to create random entries which are directly sent to a tSortRow to be ordered following a defined value entry. In this scenario, we suppose the input flow contains names of salespersons along with their respective sales and their years of presence in the company. The result of the sorting operation is displayed on the Run console.

For more technologies supported by Talend, see Talend components.

  • Drop the three components required for this use case: tRowGenerator, tSortRow and tLogRow from the Palette to the design workspace.

  • Connect them together using Row main links.

  • On the tRowGenerator editor, define the values to be randomly used in the Sort component. For more information regarding the use of this particular component, see trowgenerator_c.html

  • In this scenario, we want to rank each salesperson according to its Sales value and to its number of years in the company.

  • Double-click tSortRow to display the Basic settings tab panel. Set the sort priority on the Sales value and as secondary criteria, set the number of years in the company.

  • Use the plus button to add the number of rows required. Set the type of sorting, in this case, both criteria being integer, the sort is numerical. At last, given that the output wanted is a rank classification, set the order as descending.

  • Display the Advanced Settings tab and select the Sort on disk check box to modify the temporary memory parameters. In the Temp data directory path field, type the path to the directory where you want to store the temporary data. In the Buffer size of external sort field, set the maximum buffer value you want to allocate to the processing.

Warning:

The default buffer value is 1000000 but the more rows and/or columns you process, the higher the value needs to be to prevent the Job from automatically stopping. In that event, an "out of memory" error message displays.

  • Make sure you connected this flow to the output component, tLogRow, to display the result in the Job console.

  • Press F6 to run the Job. The ranking is based first on the Sales value and then on the number of years of experience.