Filtering rows of data based on a condition and saving the result to a local file

Pig

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Data Fabric
Talend Open Studio for Big Data
Talend Big Data Platform
Talend Big Data
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Processing components (Integration) > Pig components
Design and Development > Third-party systems > Processing components (Integration) > Pig components
Data Quality and Preparation > Third-party systems > Processing components (Integration) > Pig components
EnrichPlatform
Talend Studio

This scenario applies only to Talend products with Big Data.

For more technologies supported by Talend, see Talend components.

This scenario describes a four-component Job that filters a list of customers to find out customers from a particular country, and saves the result list to a local file. Before the input data is filtered, duplicate entries are first removed from the list.

The input file contains three columns: Name, Country, and Age, and it has some duplicate entries, as shown below:

Mario;PuertoRico;49
Mike;USA;22
Ricky;PuertoRico;37
Silvia;Spain;20
Billy;Canada;21
Ricky;PuertoRico;37
Romeo;UK;19
Natasha;Russia;25
Juan;Cuba;23
Bob;Jamaica;55
Mario;PuertoRico;49