Centralizing an Oozie connection (Deprecated) - Cloud

Centralizing an Oozie connection (Deprecated) - Cloud - 7.3

Talend Studio User Guide

Version

Cloud

7.3

Language

English

Product

Talend Big Data

Talend Big Data Platform

Talend Cloud

Talend Data Fabric

Talend Data Integration

Talend Data Management Platform

Talend Data Services Platform

Talend ESB

Talend MDM Platform

Talend Real-Time Big Data Platform

Module

Talend Studio

Content

Design and Development

Last publication date

2024-03-20

Available in...

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

About this task

If you often need to use Oozie scheduler to run and monitor Jobs on top of Hadoop, then you may want to centralize the Oozie settings in the Metadata folder in the Repository tree view.

Prerequisites:

Launch the Hadoop distribution you need to use and ensure that you have the proper access permission to that distribution and its Oozie.
Create the connection to that Hadoop distribution from the Hadoop cluster node. For further information, see Centralizing a Hadoop connection.

The Oozie scheduler is used to schedule executions of a Job, deploy and run Jobs in HDFS and monitor the executions. To create an Oozie connection, proceed as follows:

Procedure

Expand the Hadoop cluster node under Metadata in the Repository tree view, right-click the Hadoop connection to be used and select Create Oozie from the contextual menu.
In the connection wizard that opens up, fill in the generic properties of the connection you need create, such as Name, Purpose and Description. The Status field is a customized field you can define in File >Edit project properties.
Click Next when completed. The second step requires you to fill in the Oozie connection data. In the End Point field, the URL of the Oozie web application is automatically constructed with the host name of the NameNode of the Hadoop connection you are using and a default Oozie port number. This web application also allows you to consult the status of the scheduled Job executions in the Oozie Web Console in your web browser.
If the Hadoop distribution you select enables the Kerberos security, the User name field becomes deactivated.

You can still modify this Oozie URL if necessary.
If you need to use custom configuration for the Hadoop distribution to be used, click the [...] button next to Hadoop properties to open the corresponding properties table and add the property or properties to be customized. Then at runtime, these changes will override the corresponding default properties used by the Studio for its Hadoop engine.
Note a Parent Hadoop properties table is displayed above the current properties table you are editing. This parent table is read-only and lists the Hadoop properties that have been defined in the wizard of the parent Hadoop connection on which the current Oozie connection is based.

For further information about the Oozie-related properties of Hadoop, see Apache Hadoop documentation about Oozie, or the documentation of the Hadoop distribution you need to use. For example, this page lists some of the Oozie-related Hadoop properties.

For further information about how to leverage this properties table, see Setting reusable Hadoop properties.
In the User name field, enter the login user name for Oozie, or leave this field empty to use the anonymous access in which the user name of the client machine is used.
Click Check to verify the connection.
A message pops up to indicate whether the connection is successful.
Click Finish to validate these changes.
The created Oozie connection is now available under the Hadoop cluster node in the Repository tree view.

Note:
This Repository view may vary depending the edition of the Studio you are using.

Results

Then when you configure the Oozie scheduler for a Job in the Oozie scheduler view, you can reuse the centralized Oozie settings.

For further information about how to use Oozie scheduler for a Job, see Running a Job via Oozie (Deprecated).

If you need to use an environmental context to define the parameters of this connection, click the Export as context button to open the corresponding wizard and make the choice from the following options:

Create a new repository context: create this environmental context out of the current Hadoop connection, that is to say, the parameters to be set in the wizard are taken as context variables with the values you have given to these parameters.
Reuse an existing repository context: use the variables of a given environmental context to configure the current connection.

If you need to cancel the implementation of the context, click Revert context. Then the values of the context variables being used are directly put in this wizard.

For a step-by-step example about how to use this Export as context feature, see Exporting metadata as context and reusing context parameters to set up a connection.