tSplitRow - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tSplitRow splits one row into several rows.

Purpose

This component helps splitting one input row into several output rows.

tSplitRow properties

Component family

Processing/Fields

 

Basic settings

Schema and Edit Schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

Click Sync columns to retrieve the schema from the previous component connected in the Job.

 

 

Built-in: The schema will be created and stored locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and Job flowcharts. Related topic: see Talend Studio User Guide.

 

Columns mapping

Click the plus button to add as many lines as needed by mappings from input columns onto output columns.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component splits one input row into multiple output rows by mapping input columns onto output columns.

Limitation

n/a

Scenario 1: Splitting one row into two rows

This scenario describes a three-component Job. A row of data containing information of two companies will be split up into two rows.

  1. Drop the following components required for this use case: tFixedFlowInput, tSplitRow and tLogRow from the Palette to the design workspace.

  2. Connect them together using Row Main connections.

  3. Double-click tFixedFlowInput to open its Basic settings view.

  4. Select Use Inline Content(delimited file) in the Mode area.

  5. Fill the Content area with the following scripts:

    Talend;LA;California;537;5thAvenue;IT;Lionbridge;Memphis;Tennessee;537;Lincoln Road;IT Service;

  6. Click Edit schema to open a dialog box to edit the schema for the input data.

  7. Click the plus button to add twelve lines for the input columns: Company, City, State, CountryCode, Street, Industry, Company2, City2, State2, CountryCode2, Street2 and Industry2.

  8. Click OK to close the dialog box.

  9. Double-click tSplitRow to open its Basic settings view.

  10. Click Edit schema to set the schema for the output data.

  11. Click the plus button beneath the tSplitRow_1(Output) table to add four lines for the output columns: Company, CountryCode, Address and Industry.

  12. Click OK to close the dialog box. Then an empty table with column names defined in the preceding step will appear in the Columns mapping area:

  13. Click the plus button beneath the empty table in the Columns mapping area to add two lines for the output rows.

  14. Fill the table in the Columns mapping area by columns with the following values:

    Company: row1.Company, row1.Company2;

    Country: row1.CountryCode, row1.CountryCode2;

    Address: row1.Street+","+row1.City+","+row1.State, row1.Street2+","+row1.City2+","+row1.State2;

    Industry: row1.Industry, row1.Industry2;

    Note

    The value in Address column, for example, row1.Street+","+row1.City+","+row1.State, will display an absolute address by combining values in Street column, City column and State column together. The "row1" used in the values of each column refers to the input row from tFixedFlowInput.

  15. Double-click tLogRow to open its Basic settings view.

  16. Click Sync columns to retrieve the schema defined in the preceding component.

  17. Select Table in the Mode area.

  18. Save the Job and press F6 to run it.

The input data in one row is split into two rows of data containing the same company information.