Before you begin
The source files, movies.csv and directors.txt have been uploaded into HDFS as explained in Uploading files to HDFS.
The metadata of the movie.csv file has been set up in the HDFS folder under the Hadoop cluster node in the Repository.
If you have not done so, see Preparing file metadata to create the metadata.
- Expand the Hadoop cluster node under the Metadata node in the Repository and then the my_cdh Hadoop connection node and its child node to display the movies schema metadata node you have set up under the HDFS folder as explained in Preparing file metadata.
- Drop this schema metadata node onto the movie tHDFSInput component in the workspace of the Job.
Double-click the movie tHDFSInput component to open its Component view.
This tHDFSInput has automatically reused the HDFS configuration and the movie metadata from the Repository to define the related parameters in its Basic settings view.
Double-click the director tFileInputDelimited component to open its Component view.
- Click the [...] button next to Edit schema to open the schema editor.
Click the [+] button twice to add two rows and in
the Column column, rename them to ID and Name, respectively.
- Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
- In the Folder/File field, enter or browse to the directory where the director data is stored. As is explained in Uploading files to HDFS, this data has been written in /user/ychen/input_data/directors.txt.
- In Field separator field, enter a comma (,) as this is the separator used by the director data.
The input components are now configured to load the movie data and the director data to the Job.