Before you begin
-
The connection to the Hadoop cluster hosting the HDFS system to be used has been set up from the Hadoop cluster node in the Repository.
For further information about how to create this connection, see Setting up Hadoop connection manually.
-
The Hadoop cluster to be used has been properly configured and is running and you have the proper access permission to that distribution and its HDFS.
-
Ensure that the client machine on which the Talend Studio is installed can recognize the host names of the nodes of the Hadoop cluster to be used. For this purpose, add the IP address/hostname mapping entries for the services of that Hadoop cluster in the hosts file of the client machine.
For example, if the host name of the Hadoop Namenode server is talend-cdh550.weave.local, and its IP address is 192.168.x.x, the mapping entry reads 192.168.x.x talend-cdh550.weave.local.
Procedure
Results
The new HDFS connection is now available under the Hadoop cluster node in the Repository tree view. You can then use it to define and centralize the schemas of the files stored in the connected HDFS system in order to reuse these schemas in a Talend Job.