Optional map settings

Pig

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Data Fabric
Talend Open Studio for Big Data
Talend Big Data Platform
Talend Big Data
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Processing components (Integration) > Pig components
Design and Development > Third-party systems > Processing components (Integration) > Pig components
Data Quality and Preparation > Third-party systems > Processing components (Integration) > Pig components
EnrichPlatform
Talend Studio
On the input side:

Lookup properties

Value

Join Model

Inner Join;

Left Outer Join;

Right Outer Join;

Full Outer Join.

The default join option is Left Outer Join when you do not activate this option settings panel by displaying it. These options perform the join of two or more flows based on common field values.

When more than one lookup tables need joining, the main input flow starts the joining from the first lookup flow, then uses the result to join the second and so on in the same manner until the last lookup flow is joined.

Join Optimization

None;

Replicated;

Skewed;

Merge.

The default join option is None when you do not activate this option settings panel by displaying it. These options are used to perform more efficient join operations. For example, if you are using the parallelism of multiple reduce tasks, the Skewed join can be used to counteract the load imbalance problem if the data to be processed is sufficiently skewed.

Each of these options is subject to the constraints explained in Apache's documentation about Pig Latin.

Custom Partitioner

Enter the Hadoop partitioner you need to use to control the partitioning of the keys of the intermediate map-outputs. For example, enter, in double quotation marks,
org.apache.pig.test.utils.SimpleCustomPartitioner
to use the partitioner SimpleCustomPartitioner.

For further information about the code of this SimpleCustomPartitioner, see Apache's documentation about Pig Latin. The jar file of this partitioner must have been registered in the Register jar table in the Advanced settings view of the tPigLoad component linked with the tPigMap component to be used.

Increase Parallelism

Enter the number of reduce tasks. For further information about the parallel features, see Apache's documentation about Pig Latin..

On the output side:

Output properties

Value

Catch Output Reject

True;

False.

This option, once activated, allows you to catch the records rejected by a filter you can define in the appropriate area.

Catch Lookup Inner Join Reject

True;

False.

This option, once activated, allows you to catch the records rejected by the inner join operation performed on the input flows.