tJavaRow - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tJavaRow allows you to enter customized code which you can integrate in a Talend programme. With tJavaRow, you can enter the Java code to be applied to each row of the flow.

Purpose

tJavaRow allows you to broaden the functionality of Talend Jobs, using the Java language.

Depending on the Talend solution you are using, this component can be used in one, some or all of the following Job frameworks:

tJavaRow properties

Component Family

Custom Code

 

Function

tJavaRow allows you to enter customized code which you can integrate in a Talend programme. With tJavaRow, you can enter the Java code to be applied to each row of the flow.

Purpose

tJavaRow allows you to broaden the functionality of Talend Jobs, using the Java language.

Basic settings

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide.

This dynamic schema feature is designed for the purpose of retrieving unknown columns of a table and is recommended to be used for this purpose only; it is not recommended for the use of creating tables.

 

 

Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.

When the schema to be reused has default values that are integers or functions, ensure that these default values are not enclosed within quotation marks. If they are, you must remove the quotation marks manually.

For more details, see the article Verifying default values in a retrieved schema on Talend Help Center (https://help.talend.com).

  

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

Click Sync columns to retrieve the schema from the previous component connected in the Job.

 

Generate code

Click this button to automatically generate the code in the Code field to map the columns of the input schema with those of the output schema. This generation does not change anything in your schema.

The principle of this mapping is to relate the columns that have the same column name. Then you can adapt the generated code depending on the actual map you need.

 

Code

Enter the Java code to be applied to each line of the data flow.

Advanced settings

Import

Enter the Java code to import, if necessary, external libraries used in the Code field of the Basic settings view.

 

tStatCatcher Statistics

Select this check box to collect the log data at a component level..

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

To enter a global variable (for example COUNT of tFileRowCount) in the Code box, you need to type in the entire piece of code manually, that is to say ((Integer)globalMap.get("tFileRowCount_COUNT")).

Usage

This component is used as an intermediary between two other components. It must be linked to both an input and an output component.

Limitation

Knowledge of Java language is necessary.

Scenario: Transforming data line by line using tJavaRow

In this scenario, the information of a few cities read from an input delimited file is transformed using Java code through the tJavaRow component and printed on the console.

Setting up the Job

  1. Drop a tFileInputDelimited component and a tJavaRow component from the Palette onto the design workspace, and label them to better identify their roles in the Job.

  2. Connect the two components using a Row > Main connection.

Configuring the components

  1. Double-click the tFileInputDelimited component to display its Basic settings view in the Component tab.

  2. In the File name/Stream field, type in the path to the input file in double quotation marks, or browse to the path by clicking the [...] button, and define the first line of the file as the header.

    In this example, the input file has the following content:

    City;Population;LandArea;PopDensity
    Beijing;10233000;1418;7620
    Moscow;10452000;1081;9644
    Seoul;10422000;605;17215
    Tokyo;8731000;617;14151
    New York;8310000;789;10452
  3. Click the [...] button next to Edit schema to open the [Schema] dialog box, and define the data structure of the input file. Then, click OK to validate the schema setting and close the dialog box.

  4. Double-click the tJavaRow component to display its Basic settings view in the Component tab.

  5. Click Sync columns to make sure that the schema is correctly retrieved from the preceding component.

  6. In the Code field, enter the code to be applied on each line of data based on the defined schema columns.

    In this example, we want to transform the city names to upper case, group digits of numbers larger than 1000 using the thousands separator for ease of reading, and print the data on the console:

    System.out.print("\n" + input_row.City.toUpperCase() + ":");
    System.out.print("\n - Population: " 
    + FormatterUtils.format_Number(String.valueOf(input_row.Population), ',', '.') + " people");
    System.out.print("\n - Land area: " 
    + FormatterUtils.format_Number(String.valueOf(input_row.LandArea), ',', '.') 
    + " km2");
    System.out.print("\n - Population density: " 
    + FormatterUtils.format_Number(String.valueOf(input_row.PopDensity), ',', '.') + " people/km2\n");

    Note

    In the Code field, input_row refers to the link that connects to tJavaRow.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to execute the Job.

    The city information is transformed by the Java code set through tJavaRow and displayed on the console.

Scenario: Using tJavaRow to handle file content based on a dynamic schema

This scenario describes a three-component Job that uses Java code through a tJavaRow component to display the content of an input file and pass it to the output component. As all the components in this Job support the dynamic schema feature, we can leverage this feature to save the time of configuring each column of the schema.

Setting up the Job

  1. Drop tFileInputDelimited, tJavaRow, and tFileOutputDelimited from the Palette onto the design workspace, and label them according to their roles in the Job.

  2. Connect the components in a series using Row > Main links.

Configuring the input and output components

  1. Double-click the tFileInputDelimited component, which is labeled Source, to display its Basic settings view.

    Warning

    The dynamic schema feature is only supported in Built-In mode and requires the input file to have a header row.

  2. In the File name/Stream field, type in the path to the input file in double quotation marks, or browse to the path by clicking the [...] button.

  3. In the Header field, type in 1 to define the first line of the file as the header.

  4. Click the [...] button next to Edit schema to open the [Schema] dialog box.

  5. Click the [+] button to add a column, give a name to the column, dyna in this example, and select Dynamic from the Type list. This dynamic column will retrieve the three columns, FirstName, LastName and Address, of the input file.

  6. Click OK to validate the setting and close the dialog box.

  7. Double-click the tFileOutputDelimited component, which is labeled Target, to display its Basic settings view.

  8. Define the output file path in the File Name field.

  9. Select the Include Header check box to include the header in the output file. Leave all the other settings are they are.

Configuring the tJavaRow component

  1. Double-click tJavaRow to display its Basic settings view and define the components properties.

  2. Click Sync columns to make sure that the schema is correctly retrieved from the preceding component.

  3. In the Code field, enter the following code to display the content of the input file and pass the data to the next component based on the defined dynamic schema column:

    System.out.println(input_row.dyna);
    output_row.dyna = input_row.dyna;

    Note

    In the Code field, input_row and output_row correspond to the links to and from tJavaRow.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Pressing F6, or click Run on the Run tab to execute the Job.

    The content of the input file is displayed on the console and written to the output file.