The Pig components to be used are orchestrated in the Job workspace to compose a
Pig process for data transformation.
In the Job, enter the name of the component to be used and select
this component from the list that appears. In this scenario, the components are
two tPigLoad components, a tPigMap component and two tPigStoreResult components.
The two tPigLoad
components are used to load the movie data and the director data,
respectively, from HDFS into the data flow of the current Job.
The tPigMap component
is used to transform the input data.
components write the results into given directories in HDFS.
Double-click the label of one of the tPigLoad component to make this label editable and then enter
movie to change the label of this
Do the same to label another tPigLoad component to director.
Right click the tPigLoad
component that is labelled movie, then from
the contextual menu, select Row > Pig
combine and click tPigMap to
connect this tPigLoad to the tPigMap component. This is the main link through
which the movie data is sent to tPigMap.
Do the same to connect the director
tPigLoad component to tPigMap using the Row >
Pig combine link. This is the Lookup link through which the director data is sent to
tPigMap as lookup data.
Do the same to connect the tPigMap component to tPigStoreResult using the Row > Pig
combine link, then in the pop-up wizard, name this link to
out1 and click OK to validate this change.
Repeat these operations to connect the tPigMap component to another tPigStoreResult component using the Row
> Pig combine link and name it to reject.
Now the whole Job looks as follows in the workspace: