How to set HDFS connection details - 6.5

Talend Big Data Studio User Guide

EnrichVersion
6.5
EnrichProdName
Talend Big Data
task
Design and Development
EnrichPlatform
Talend Studio

Before you can run or schedule executions of a Job on an HDFS server, you need first to define the HDFS connection details in the Oozie scheduler view, and specify the path where your Job will be deployed.

Defining HDFS connection details in Oozie scheduler view

To define HDFS connection details in the Oozie scheduler view, do the following:

  1. Click the Oozie schedule view beneath the design workspace.

  2. Click Setting to open the connection setup dialog box.

    Warning

    The connection settings shown above are for an example only.

    • If you have set up the Oozie connection in the Repository as explained in Centralizing an Oozie connection,you can easily reuse it.

      To do this, select Repository from the Property type drop-down list, then click the [...] button to open the [Repository Content] dialog box and select the Oozie connection to be used.

    • Otherwise, fill in the connection information in the corresponding fields as explained in the table below:

      Field/OptionDescription

      Hadoop distribution

      Hadoop distribution to be connected to. This distribution hosts the HDFS file system to be used. If you select Custom to connect to a custom Hadoop distribution, then you need to click the [...] button to open the [Import custom definition] dialog box and from this dialog box, to import the jar files required by that custom distribution.

      For further information, see Connecting to custom Hadoop distribution..

      Hadoop version

      Version of the Hadoop distribution to be connected to. This list disappears if you select Custom from the Hadoop distribution list.

      Enable kerberos security

      If you are accessing the Hadoop cluster running with Kerberos security, select this check box, then, enter the Kerberos principal name for the NameNode in the field displayed. This enables you to use your user name to authenticate against the credentials stored in Kerberos.

      This check box is available depending on the Hadoop distribution you are connecting to.

      User Name

      Login user name.

      Name node end point

      URI of the name node, the centerpiece of the HDFS file system.

      Job tracker end point

      URI of the Job Tracker node, which farms out MapReduce tasks to specific nodes in the cluster.

      Oozie end point

      URI of the Oozie web console, for Job execution monitoring.

      Hadoop Properties

      If you need to use custom configuration for the Hadoop of interest, complete this table with the property or properties to be customized. Then at runtime, these changes will override the corresponding default properties used by the Studio for its Hadoop engine.

      For further information about the properties required by Hadoop, see Apache's Hadoop documentation on http://hadoop.apache.org, or the documentation of the Hadoop distribution you need to use.

      Note

      Settings defined in this table are effective on a per-Job basis.

Upon defining the deployment path in the Oozie scheduler view, you are ready to schedule executions of your Job, or run it immediately, on the HDFS server.