tJavaMR properties - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Component family

Custom Code

 

Basic settings

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

  

Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.

  

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.

  

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

 

Map only

Select this check box to edit and use a custom mapper only. In that situation, only the Map code editing field and the Map advanced code area are available.

 

Map code

Enter the body of the map method you want to execute.

This component automatically defines the other parts of the map method and if you are not using Map only, tJavaMR also automatically uses the column names you specify in the mrKeyStruct and the mrValueStruct tables to instantiate the key/value pairs intermediate between the map and the reduce phases.

For example, you put word as column name in the mrKeyStruct table; then when you write the code, you need to write mrKey.word to represent the corresponding key instance and at runtime, this instance is automatically constructed to be mrKey_component_ID.word such as mrKey_tJavaMR_1.word.

Note that the text displayed above the Map code editing field indicates the parameters you can directly use in writing the code, such as mrKey, mrValue or outputRow.

For further information about a map method and the intermediate key/value pairs it outputs, see https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/Mapper.html.

For further information about Java functions syntax specific to Talend, see Talend Studio Help Contents (Help > Developer Guide > API Reference).

For a complete Java reference, check http://docs.oracle.com/javaee/6/api/.

 

mrKeyStruct and mrValueStruct

In these two tables, add the columns you want to use to compose the key/value pairs required by MapReduce computations.

 

Reduce code

Enter the body of the reduce method you want to execute according to the task you need to perform.

This component automatically defines the shuffle and sort phases and the other parts of the reduce method and uses the column names you specify in the mrKeyStruct and the mrValueStruct tables to instantiate the key/value pairs that have been shuffled and sorted.

For example, you put word as column name in the mrKeyStruct table, then when you write the code, you need to write key.word to represent the corresponding key instance and at runtime, this instance is automatically constructed to be key_component_ID.word such as key_tJavaMR_1.word; you put count in the mrValueStruct table, then you have to write values.count to define the corresponding value instance and at runtime, this instance is constructed to be values_component_ID.count such as values_tJavaMR_1.count.

Note that the text displayed above the Reduce code editing field indicates the parameters you can directly use in writing the code.

For further information about a reduce method and its related phases and key/value pairs, see https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/Reducer.html.

For further information about Java functions syntax specific to Talend, see Talend Studio Help Contents (Help > Developer Guide > API Reference).

For a complete Java reference, check http://docs.oracle.com/javaee/6/api/.

Advanced settings

Map advanced code

This area allows you to define the classes, variables and methods that you want to use along with the map method defined in the Basic settings view. Note that the advanced code is not required for using tJavaMR.

Three fields are available for this purpose:

Implement the prepare code: select this check box and in the displayed field, define variables, methods and inner classes to be nested in the body of the public class of this component's mapper.

Implement the configure method: select this check box and in the displayed field, define the body of the configure method of this component's mapper.

Implement the close method: select this check box and in the displayed field, define the body of the close method of this component's mapper.

 

Reduce advanced code

This area allows you to define the classes, variables and methods that you want to use along with the reduce method defined in the Basic settings view. Note that the advanced code is not required for using tJavaMR.

Three fields are available for this purpose:

Implement the prepare code: select this check box and in the displayed field, define variables, methods and inner classes to be nested in the body of the public class of this component's reducer.

Implement the configure method: select this check box and in the displayed field, define the body of the configure method of this component's reducer.

Implement the close method: select this check box and in the displayed field, define the body of the close method of this component's reducer.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Once a Map/Reduce Job is opened in the workspace, tJavaMR appears in the Palette of the Studio. It is used as an intermediate step in a Map/Reduce Job.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.

Hadoop Connection

You need to use the Hadoop Configuration tab in the Run view to define the connection to a given Hadoop distribution for the whole Job.

This connection is effective on a per-Job basis.