Defining HDFS connection details in Oozie scheduler view - 7.1

Talend Big Data Studio User Guide

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data
task
Design and Development
EnrichPlatform
Talend Studio

Talend Oozie allows you to schedule executions of Jobs you have designed with the Studio.

Before you can run or schedule executions of a Job on an HDFS server, you need first to define the HDFS connection details in the Oozie scheduler view, and specify the path where your Job will be deployed.

Procedure

  1. Click the Oozie schedule view beneath the design workspace.
  2. Click Setting to open the connection setup dialog box.
  3. Set up the Oozie connection.
    • If you have set up the Oozie connection in the Repository as explained in Centralizing an Oozie connection, you can easily reuse it. To do this, select Repository from the Property type drop-down list, then click the [...] button to open the Repository Content dialog box and select the Oozie connection to be used.

    • If you have not set up the Oozie connection, fill in the connection information in the corresponding fields as explained in the table below:

      Field/Option Description

      Hadoop distribution

      Hadoop distribution to be connected to. This distribution hosts the HDFS file system to be used. If you select Custom to connect to a custom Hadoop distribution, then you need to click the [...] button to open the Import custom definition dialog box and from this dialog box, to import the jar files required by that custom distribution.

      For further information, see Connecting to custom Hadoop distribution..

      Hadoop version

      Version of the Hadoop distribution to be connected to. This list disappears if you select Custom from the Hadoop distribution list.

      Enable kerberos security

      If you are accessing the Hadoop cluster running with Kerberos security, select this check box, then, enter the Kerberos principal name for the NameNode in the field displayed. This enables you to use your user name to authenticate against the credentials stored in Kerberos.

      This check box is available depending on the Hadoop distribution you are connecting to.

      User Name

      Login user name.

      Name node end point

      URI of the name node, the centerpiece of the HDFS file system.

      Job tracker end point

      URI of the Job Tracker node, which farms out MapReduce tasks to specific nodes in the cluster.

      Oozie end point

      URI of the Oozie web console, for Job execution monitoring.

      Hadoop Properties

      If you need to use custom configuration for the Hadoop of interest, complete this table with the property or properties to be customized. Then at runtime, these changes will override the corresponding default properties used by the Studio for its Hadoop engine.

      For further information about the properties required by Hadoop, see Apache's Hadoop documentation on http://hadoop.apache.org, or the documentation of the Hadoop distribution you need to use.

      Note:

      Settings defined in this table are effective on a per-Job basis.

Results

Upon defining the deployment path in the Oozie scheduler view, you are ready to schedule executions of your Job, or run it immediately, on the HDFS server.