Aggregating values based on dynamic schema - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Here is an example of using the tAggregateRow component to aggregate some task assignment data in a CSV file based on a dynamic schema column.

Creating a Job for aggregating values based on dynamic schema

Create a Job to aggregate some task assignment data in a CSV file based on a dynamic schema column using the tAggregateRow component, then display the aggregated data on the console and write it into an output CSV file.

  1. Create a new Job and add a tFileInputDelimited component, a tAggregateRow component, a tLogRow component, and a tFileOutputDelimited component by typing their names in the design workspace or dropping them from the Palette.

  2. Link the tFileInputDelimited component to the tAggregateRow component using a Row > Main connection.

  3. Do the same to link the tAggregateRow component to the tLogRow component, and the tLogRow component to the tFileOutputDelimited component.

Configuring the Job for aggregating values based on dynamic schema

Configure the Job to aggregate some task assignment data in a CSV file based on a dynamic schema column using the tAggregateRow component, then display the aggregated data on the console using the tLogRow component and write it into an output CSV file using the tFileOutputDelimited component.

  1. Double-click the tFileInputDelimited component to open its Basic settings view.

  2. In the File name/Stream field, specify the path to the CSV file that holds the following task assignment data, D:/tasks.csv in this example.

    task;team;status
    task1;team1;done
    task2;team2;done
    task3;team1;done
    task4;team2;pending
    task5;team1;pending
    task6;team2;pending
  3. In the Header field, enter the number of rows to be skipped in the beginning of the file, 1 in this example.

    Note that the dynamic schema feature is only supported in the Built-In mode and requires the input file to have a header row.

  4. Click the button next to Edit schema to open the schema dialog box and define the schema by adding two columns, task of String type and other of Dynamic type. When done, click OK to save the changes and close the schema dialog box.

    Note that the dynamic column must be defined in the last row of the schema. For more information about dynamic schema, see Talend Studio User Guide.

  5. Double-click the tAggregateRow component, and on its Basic settings view, click the Sync columns button to retrieve the schema from the preceding component.

  6. Add one row in the Group by table by clicking the button below it, and select other from both the Output column and Input column position column fields to group the input data by the other dynamic column.

    Note that the dynamic column aggregation can be carried out only for the grouping operation.

  7. Add one row in the Operations table and define the operation to be carried out. In this example, the operation function is list. Then select task from both the Output column and Input column position column fields to list the entries in the task column in the grouping result.

  8. Double-click the tLogRow component to open its Basic settings view, and then select Table (print values in cells of a table) in the Mode area for better readability of the result.

  9. Double-click the tFileOutputDelimited component to open its Basic settings view, and in the File Name field, specify the path to the CSV file into which the aggregated data will be written, D:/tasks_aggregated.csv in this example.

  10. Select the Include Header check box to include the header of each column in the CSV file.

Executing the Job to aggregate values based on dynamic schema

After setting up the Job and configuring the components used in the Job for aggregating the task assignment data based on a dynamic schema column, you can then execute the Job and verify the Job execution result.

  1. Press Ctrl + S to save the Job.

  2. Press F6 to execute the Job.

    As shown above, the task assignment data is aggregated based on the other dynamic column, and the aggregated data is displayed on the console and written into the output CSV file.