About this task
If you often need to use Oozie scheduler to run and monitor Jobs on top of Hadoop, then you may want to centralize the Oozie settings in the Metadata folder in the Repository tree view.
Launch the Hadoop distribution you need to use and ensure that you have the proper access permission to that distribution and its Oozie.
Create the connection to that Hadoop distribution from the Hadoop cluster node. For further information, see Centralizing a Hadoop connection.
The Oozie scheduler is used to schedule executions of a Job, deploy and run Jobs in HDFS and monitor the executions. To create an Oozie connection, proceed as follows:
- Expand the Hadoop cluster node under Metadata in the Repository tree view, right-click the Hadoop connection to be used and select Create Oozie from the contextual menu.
In the connection wizard that opens up, fill in the generic properties of the
connection you need create, such as Name, Purpose and Description.
The Status field is a customized field you can
define in File >Edit project properties.
Click Next when completed. The second step
requires you to fill in the Oozie connection data. In the End
Point field, the URL of the Oozie web application is automatically
constructed with the host name of the NameNode of the Hadoop connection you are
using and a default Oozie port number. This web application also allows you to
consult the status of the scheduled Job executions in the Oozie Web Console in your web browser.
If the Hadoop distribution you select enables the Kerberos security, the User name field becomes deactivated.You can still modify this Oozie URL if necessary.
If you need to use custom configuration for the Hadoop distribution to be used,
click the [...] button next to Hadoop properties to
open the corresponding properties table and add the property or properties to be
customized. Then at runtime, these changes will override the corresponding default
properties used by the Studio for its Hadoop engine.
Note a Parent Hadoop properties table is displayed above the current properties table you are editing. This parent table is read-only and lists the Hadoop properties that have been defined in the wizard of the parent Hadoop connection on which the current Oozie connection is based.For further information about the Oozie-related properties of Hadoop, see Apache's Hadoop documentation about Oozie on https://oozie.apache.org/docs, or the documentation of the Hadoop distribution you need to use. For example, the following page lists some of the Oozie-related Hadoop properties: https://oozie.apache.org/docs/4.1.0/AG_HadoopConfiguration.html.For further information about how to leverage this properties table, see Setting reusable Hadoop properties.
- In the User name field, enter the login user name for Oozie, or leave this field empty to use the anonymous access in which the user name of the client machine is used.
Click Check to verify the connection.
A message pops up to indicate whether the connection is successful.
Click Finish to validate these changes.
The created Oozie connection is now available under the Hadoop cluster node in the Repository tree view.Note:
This Repository view may vary depending the edition of the Studio you are using.
Then when you configure the Oozie scheduler for a Job in the Oozie scheduler view, you can reuse the centralized Oozie settings.
For further information about how to use Oozie scheduler for a Job, see Running a Job via Oozie.
Create a new repository context: create this environmental context out of the current Hadoop connection, that is to say, the parameters to be set in the wizard are taken as context variables with the values you have given to these parameters.
Reuse an existing repository context: use the variables of a given environmental context to configure the current connection.
For a step-by-step example about how to use this Export as context feature, see Exporting metadata as context and reusing context parameters to set up a connection.