Joining two files based on an exact match and saving the result to a local file

Pig

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Data Fabric
Talend Open Studio for Big Data
Talend Big Data Platform
Talend Big Data
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Processing components (Integration) > Pig components
Design and Development > Third-party systems > Processing components (Integration) > Pig components
Data Quality and Preparation > Third-party systems > Processing components (Integration) > Pig components
EnrichPlatform
Talend Studio

This scenario applies only to Talend products with Big Data.

For more technologies supported by Talend, see Talend components.

This scenario describes a four-component Job that combines data of an input file and a reference file that matches a given join key, removes unwanted columns, and then saves the final result to a local file.

The main input file contains the information about people's IDs, first names, last names, group IDs, and salaries, as shown below:

1;Woodrow;Johnson;3;1013.39
2;Millard;Monroe;2;8077.59
3;Calvin;Eisenhower;3;6866.88
4;Lyndon;Wilson;3;5726.28
5;Ronald;Garfield;2;4158.58
6;Rutherford;Buchanan;3;2897.00
7;Calvin;Coolidge;1;6650.66
8;Ulysses;Roosevelt;2;7854.78
9;Grover;Tyler;1;5226.88
10;Bill;Tyler;2;8964.66

The reference file contains only the information of group IDs and group names:

1;group_A
2;group_B