Dropping and linking components - 7.1

Talend Open Studio for Big Data Getting Started Guide

Talend Open Studio for Big Data
Talend Studio
Design and Development
Installation and Upgrade
The Pig components to be used are orchestrated in the Job workspace to compose a Pig process for data transformation.

Before you begin

  • You have launched your Talend Studio and opened the Integration perspective.

  • An empty Job has been created as described in Creating the Job and is open in the workspace.


  1. In the Job, enter the name of the component to be used and select this component from the list that appears. In this scenario, the components are two tPigLoad components, a tPigMap component and two tPigStoreResult components.
    • The two tPigLoad components are used to load the movie data and the director data, respectively, from HDFS into the data flow of the current Job.

    • The tPigMap component is used to transform the input data.

    • The tPigStoreResult components write the results into given directories in HDFS.

  2. Double-click the label of one of the tPigLoad component to make this label editable and then enter movie to change the label of this tPigLoad.
  3. Do the same to label another tPigLoad component to director.
  4. Right click the tPigLoad component that is labelled movie, then from the contextual menu, select Row > Pig combine and click tPigMap to connect this tPigLoad to the tPigMap component. This is the main link through which the movie data is sent to tPigMap.
  5. Do the same to connect the director tPigLoad component to tPigMap using the Row > Pig combine link. This is the Lookup link through which the director data is sent to tPigMap as lookup data.
  6. Do the same to connect the tPigMap component to tPigStoreResult using the Row > Pig combine link, then in the pop-up wizard, name this link to out1 and click OK to validate this change.
  7. Repeat these operations to connect the tPigMap component to another tPigStoreResult component using the Row > Pig combine link and name it to reject.


Now the whole Job looks as follows in the workspace: