You orchestrate the MapReduce components in the Job workspace in
order to design a data transformation process that runs in the MapReduce framework.
In the Job, enter the name of the component to be used and select this component
from the list that appears. In this scenario, the components are two tHDFSInput components, a tFileInputDelimited component, a tMap
component, a tHDFSOutput component and a tFileOutputDelimited component.
The tHDFSInput and the tFileInputDelimited components are used to load
the movie data and the director data, respectively, from HDFS into the
data flow of the current Job.
The tMap component is used to
transform the input data.
The tHDFSOuput and the tFileOutputDelimited components write the
results into given directories in HDFS.
Double-click the tHDFSInput component to make this
label editable and then enter movie to change the
label of this component.
Do the same to label tFileInputDelimited to
Right click the tHDFSInput component that is
labelled movie, then from the contextual menu,
select Row > Main and click tMap to connect it to tMap. This is the main link through which the movie data is sent to tMap.
Do the same to connect the director tFileInputDelimited component to tMap using the Row > Main link. This is the Lookup link through which
the director data is sent to tMap as lookup
Do the same to connect the tMap
component to tHDFSOutput using the Row > Main link, then in the pop-up wizard, name this link to
out1 and click OK to validate this change.
Repeat these operations to connect the tMap component to tFileOutputDelimited
component using the Row > Main link and name it to
In the workspace, the whole Job looks like this: