tDenormalizeSortedRow - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tDenormalizeSortedRow combines in a group all input sorted rows. Distinct values of the denormalized sorted row are joined with item separators.

Purpose

tDenormalizeSortedRow helps synthesizing sorted input flow to save memory.

tDenormalizeSortedRow properties

Component family

Processing/Fields

 

Basic settings

Schema and Edit Schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

Click Sync columns to retrieve the schema from the previous component in the Job.

 

 

Built-in: You create the schema and store it locally for the relevant component. Related topic: see Talend Studio User Guide.

 

 

Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and Job flowcharts. Related topic: see Talend Studio User Guide.

 

Input rows count

Enter the number of input rows.

 

To denormalize

Enter the name of the column to denormalize.

Advanced settings

tStatCatcher Statistics

Select this ckeck box to collect the log data at component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component handles flows of data therefore it requires input and output components.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

Scenario: Regrouping sorted rows

This Java scenario describes a four-component Job. It aims at reading a given delimited file row by row, sorting input data by sort type and order, denormalizing all input sorted rows and displaying the output on the Run log console.

  • Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tSortRow, tDenormalizeSortedRow, and tLogRow.

  • Connect the four components using Row Main links.

  • In the design workspace, select tFileInputDelimited.

  • Click the Component tab to define the basic settings for tFileInputDelimited.

  • Set Property Type to Built-In.

  • Fill in a path to the processed file in the File Name field. The name_list file used in this example holds two columns, id and first name.

  • If needed, define row and field separators, header and footer, and the number of processed rows.

  • Set Schema to Built in and click the three-dot button next to Edit Schema to define the data to pass on to the next component. The schema in this example consists of two columns, id and name.

  • In the design workspace, select tSortRow.

  • Click the Component tab to define the basic settings for tSortRow.

  • Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tFileInputDelimited component.

  • In the Criteria panel, use the plus button to add a line and set the sorting parameters for the schema column to be processed. In this example we want to sort the id columns in ascending order.

  • In the design workspace, select tDenormalizeSortedRow.

  • Click the Component tab to define the basic settings for tDenormalizeSortedRow.

  • Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tSortRow component.

  • In the Input rows countfield, enter the number of the input rows to be processed or press Ctrl+Space to access the context variable list and select the variable: tFileInputDelimited_1_NB_LINE.

  • In the To denormalize panel, use the plus button to add a line and set the parameters to the column to be denormalize. In this example we want to denormalize the name column.

  • In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information about tLogRow, see tLogRow.

  • Save your Job and press F6 to execute it.

The result displayed on the console shows how the name column was denormalize.