tPivotToColumnsDelimited - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

tPivotToColumnsDelimited Properties

Component family

File/Output

 

Function

tPivotToColumnsDelimited outputs data based on an aggregation operation carried out on a pivot column.

Purpose

tPivotToColumnsDelimited is used to fine-tune the selection of data to output

Basic settings

Pivot column

Select the column from the incoming flow that will be used as pivot for the aggregation operation.

 

Aggregation column

Select the column from the incoming flow that contains the data to be aggregated.

 

Aggregation function

Select the function to be used in case several values are available for the pivot column.

 

Group by

Define the aggregation sets, the values of which will be used for calculations.

 

 

Input Column: Match the input column label with your output columns, in case the output label of the aggregation set needs to be different.

 

File Name

Name or path to the file to be processed and/or the variable to be used.

For further information about how to define and use a variable in a Job, see Talend Studio User Guide.

 

Field separator

Character, string or regular expression to separate fields of the output file.

 

Row separator

String (ex: "\n"on Unix) to distinguish rows in the output file.

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

NB_LINE_OUT: the number of rows written to the file by the component. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component requires an input flow.

Limitation

Due to license incompatibility, one or more JARs required to use this component are not provided. You can install the missing JARs for this particular component by clicking the Install button on the Component tab view. You can also find out and add all missing JARs easily on the Modules tab in the Integration perspective of your studio. For details, see https://help.talend.com/display/KB/How+to+install+external+modules+in+the+Talend+products or the section describing how to configure the Studio in the Talend Installation Guide.

Scenario: Using a pivot column to aggregate data

The following scenario describes a Job that aggregates data from a delimited input file, using a defined pivot column.

Dropping and linking components

  1. Drop the following component from the Palette to the design workspace: tFileInputDelimited, tPivotToColumnsDelimited.

  2. Link the two components using a Row > Main connection.

Configuring the components

Set the input component

  1. Double-click the tFileInputDelimited component to open its Basic settings view.

  2. Browse to the input file to fill out the File Name field.

    The file to use as input file is made of 3 columns, including: ID, Question and the corresponding Answer

  3. Define the Row and Field separators, in this example, respectively: carriage return and semi-colon

  4. As the file contains a header line, define it also.

  5. Set the schema describing the three columns: ID, Questions, Answers.

Set the output component

  1. Double-click the tPivotToColumnsDelimited component to open its Basic settings view.

  2. In the Pivot column field, select the pivot column from the input schema. this is often the column presenting most duplicates (pivot aggregation values).

  3. In the Aggregation column field, select the column from the input schema that should gets aggregated.

  4. In the Aggregation function field, select the function to be used in case duplicates are found out.

  5. In the Group by table, add an Input column, that will be used to group by the aggregation column.

  6. In the File Name field, browse to the output file path. And on the Row and Field separator fields, set the separators for the aggregated output rows and data.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to execute the Job.

    The output file shows the newly aggregated data.