Configuring the input data - 6.5

Talend Real-Time Big Data Platform Getting Started Guide

English (United States)
Talend Real-Time Big Data Platform
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Profiling data
Design and Development
Installation and Upgrade
The tHDFSInput component and the tFileInputDelimited components are configured to load data from HDFS into the Job.

Before you begin

  • The source files, movies.csv and directors.txt have been uploaded into HDFS as explained in Uploading files to HDFS.

  • The metadata of the movie.csv file has been set up in the HDFS folder under the Hadoop cluster node in the Repository.

    If you have not done so, see Preparing file metadata to create the metadata.


  1. Expand the Hadoop cluster node under the Metadata node in the Repository and then the my_cdh Hadoop connection node and its child node to display the movies schema metadata node you have set up under the HDFS folder as explained in Preparing file metadata.
  2. Drop this schema metadata node onto the movie tHDFSInput component in the workspace of the Job.
  3. Double-click the movie tHDFSInput component to open its Component view.

    This tHDFSInput has automatically reused the HDFS configuration and the movie metadata from the Repository to define the related parameters in its Basic settings view.

  4. Double-click the director tFileInputDelimited component to open its Component view.
  5. Click the [...] button next to Edit schema to open the schema editor.
  6. Click the [+] button twice to add two rows and in the Column column, rename them to ID and Name, respectively.
  7. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
  8. In the Folder/File field, enter or browse to the directory where the director data is stored. As is explained in Uploading files to HDFS, this data has been written in /user/ychen/input_data/directors.txt.
  9. In Field separator field, enter a comma (,) as this is the separator used by the director data.


The input components are now configured to load the movie data and the director data to the Job.