Centralizing an Oozie connection - 6.1

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

If you often need to use Oozie scheduler to run and monitor Jobs on top of Hadoop, then you may want to centralize the Oozie settings in the Metadata folder in the Repository tree view.

Prerequisites:

  • Launch the Hadoop distribution you need to use and ensure that you have the proper access permission to that distribution and its Oozie.

  • Create the connection to that Hadoop distribution from the Hadoop cluster node. For further information, see Centralizing a Hadoop connection.

The Oozie scheduler is used to schedule executions of a Job, deploy and run Jobs in HDFS and monitor the executions. To create an Oozie connection, proceed as follows:

  1. Expand the Hadoop cluster node under Metadata in the Repository tree view, right-click the Hadoop connection to be used and select Create Oozie from the contextual menu.

  2. In the connection wizard that opens up, fill in the generic properties of the connection you need create, such as Name, Purpose and Description. The Status field is a customized field you can define in File >Edit project properties.

  3. Click Next when completed. The second step requires you to fill in the Oozie connection data. In the End Point field, the URL of the Oozie web application is automatically constructed with the host name of the NameNode of the Hadoop connection you are using and a default Oozie port number. This web application also allows you to consult the status of the scheduled Job executions in the Oozie Web Console in your web browser.

    If the Hadoop distribution you select enables the Kerberos security, the User name field becomes deactivated.

    You can still modify this Oozie URL if necessary.

  4. If you need to use custom configuration for the Hadoop distribution to be used, click the [...] button next to Hadoop properties to open the corresponding properties table and add the property or properties to be customized. Then at runtime, these changes will override the corresponding default properties used by the Studio for its Hadoop engine.

    Note a Parent Hadoop properties table is displayed above the current properties table you are editing. This parent table is read-only and lists the Hadoop properties that have been defined in the wizard of the parent Hadoop connection on which the current Oozie connection is based.

    For further information about the Oozie-related properties of Hadoop, see Apache's Hadoop documentation about Oozie on https://oozie.apache.org/docs, or the documentation of the Hadoop distribution you need to use. For example, the following page lists some of the Oozie-related Hadoop properties: https://oozie.apache.org/docs/4.1.0/AG_HadoopConfiguration.html.

    For further information about how to leverage this properties table, see Setting reusable Hadoop properties.

  5. In the User name field, enter the login user name for Oozie, or leave this field empty to use the anonymous access in which the user name of the client machine is used.

  6. Click Check to verify the connection.

    A message pops up to indicate whether the connection is successful.

  7. Click Finish to validate these changes.

    The created Oozie connection is now available under the Hadoop cluster node in the Repository tree view.

    Note

    This Repository view may vary depending the edition of the Studio you are using.

Then when you configure the Oozie scheduler for a Job in the Oozie scheduler view, you can reuse the centralized Oozie settings.

For further information about how to use Oozie scheduler for a Job, see How to run a Job via Oozie.

If you need to use an environmental context to define the parameters of this connection, click the Export as context button to open the corresponding wizard and make the choice from the following options:

  • Create a new repository context: create this environmental context out of the current Hadoop connection, that is to say, the parameters to be set in the wizard are taken as context variables with the values you have given to these parameters.

  • Reuse an existing repository context: use the variables of a given environmental context to configure the current connection.

If you need to cancel the implementation of the context, click Revert context. Then the values of the context variables being used are directly put in this wizard.

For a step-by-step example about how to use this Export as context feature, see Exporting metadata as context and reusing context parameters to set up a connection.