You orchestrate the MapReduce components in the Job workspace in
order to design a data transformation process that runs in the MapReduce framework.
Procedure
-
In the Job, enter the name of the component to be used and select this component
from the list that appears. In this scenario, the components are two tHDFSInput components, a tFileInputDelimited component, a tMap
component, a tHDFSOutput component and a tFileOutputDelimited component.
-
The tHDFSInput and the tFileInputDelimited components are used to load
the movie data and the director data, respectively, from HDFS into the
data flow of the current Job.
-
The tMap component is used to
transform the input data.
-
The tHDFSOuput and the tFileOutputDelimited components write the
results into given directories in HDFS.
-
Double-click the tHDFSInput component to make this
label editable and then enter movie to change the
label of this component.
-
Do the same to label tFileInputDelimited to
director.
-
Right click the tHDFSInput component that is
labelled movie, then from the contextual menu,
select Row > Main and click tMap to connect it to tMap. This is the main link through which the movie data is sent to tMap.
-
Do the same to connect the director tFileInputDelimited component to tMap using the Row > Main link. This is the Lookup link through which
the director data is sent to tMap as lookup
data.
-
Do the same to connect the tMap
component to tHDFSOutput using the Row > Main link, then in the pop-up wizard, name this link to
out1 and click OK to validate this change.
-
Repeat these operations to connect the tMap component to tFileOutputDelimited
component using the Row > Main link and name it to
reject.
Results
In the workspace, the whole Job looks like this: