Linking the components

Pig

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Data Fabric
Talend Open Studio for Big Data
Talend Big Data Platform
Talend Big Data
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Processing components (Integration) > Pig components
Design and Development > Third-party systems > Processing components (Integration) > Pig components
Data Quality and Preparation > Third-party systems > Processing components (Integration) > Pig components
EnrichPlatform
Talend Studio

Procedure

  1. In the Integration perspective of Talend Studio , create an empty Job, named Replicate for example, from the Job Designs node in the Repository tree view.
    For further information about how to create a Job, see the Talend Studio User Guide.
  2. Drop tPigLoad, tPigReplicate, two tPigSort and two tPigStoreResult onto the workspace.
    The tPigLoad component reads data from the given HDFS system. The sample data used in this scenario reads as follows:
    Andrew Kennedy;Mississippi
    Benjamin Carter;Louisiana
    Benjamin Monroe;West Virginia
    Bill Harrison;Tennessee
    Calvin Grant;Virginia
    Chester Harrison;Rhode Island
    Chester Hoover;Kansas
    Chester Kennedy;Maryland
    Chester Polk;Indiana
    Dwight Nixon;Nevada
    Dwight Roosevelt;Mississippi
    Franklin Grant;Nebraska
    The location of the data in this scenario is /user/ychen/raw/Name&State.csv.
  3. Connect them using the Row > Pig combine links.