The tHDFSInput component and the
tFileInputDelimited components are configured to load data
from HDFS into the Job.
Procedure
-
Expand the Hadoop cluster node
under the Metadata node in the Repository and then the my_cdh Hadoop connection node and its child node to display the
movies schema metadata node you have set up under
the HDFS folder as explained in Preparing file metadata.
-
Drop this schema metadata node onto the movie tHDFSInput component in the workspace of the
Job.
-
Double-click the movie tHDFSInput component to open its Component view.
This tHDFSInput has
automatically reused the HDFS configuration and the movie metadata from the
Repository to define the related parameters in its
Basic settings view.
-
Double-click the director tFileInputDelimited component to open its Component view.
-
Click the [...] button next to Edit schema to open the schema editor.
-
Click the [+] button twice to add two rows and in
the Column column, rename them to ID and Name, respectively.
-
Click OK to validate these changes and accept the
propagation prompted by the pop-up dialog box.
-
In the Folder/File field, enter or
browse to the directory where the director data is stored. As is explained in Uploading files to HDFS, this data has been written in /user/ychen/input_data/directors.txt.
-
In Field separator field, enter a
comma (,) as this is the separator used by the director data.
Results
The input components are now configured to load the movie data and the
director data to the Job.