Configuring join operations - 6.1

Talend Big Data Studio User Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
task
Design and Development
EnrichPlatform
Talend Studio

On the input side, you can display the panel used for settings the join options by clicking the button on the appropriate table.

Lookup properties

Value

Join Model

Inner Join;

Left Outer Join;

Right Outer Join;

Full Outer Join.

The default join option is Left Outer Join when you do not activate this option settings panel by displaying it. These options perform the join of two or more flows based on common field values.

When more than one lookup tables need join, the main input flow starts the join from the first lookup flow, then uses the result to join the second and so on in the same manner until the last lookup flow is joined.

Join Optimization

None;

Replicated;

Skewed;

Merge.

The default join option is None when you do not activate this option settings panel by displaying it. These options are used to perform more efficient join operations. For example, if you are using the parallelism of multiple reduce tasks, the Skewed join can be used to counteract the load imbalance problem if the data to be processed is sufficiently skewed.

Each of these options is subject to the constraints explained in Apache's documentation about Pig Latin.

Custom Partitioner

Enter the Hadoop partitioner you need to use to control the partitioning of the keys of the intermediate map-outputs. For example, enter, in double quotation marks,

org.apache.pig.test.utils.SimpleCustomPartitioner

to use the partitioner SimpleCustomPartitioner. The jar file of this partitioner must have been registered in the Register jar table in the Advanced settings view of the tPigLoad component linked with the tPigMap component to be used.

For further information about the code of this SimpleCustomPartitioner, see Apache's documentation about Pig Latin.

Increase Parallelism

Enter the number of reduce tasks for the Hadoop MapReduce tasks generated by Pig. For further information about the parallelism of reduce tasks, see Apache's documentation about Pig Latin.