Available in...Big Data
Big Data Platform
Cloud Big Data
Cloud Big Data Platform
Cloud Data Fabric
Data Fabric
Real-Time Big Data Platform
About this task
If you often need to use Oozie scheduler to run and
monitor Jobs on top of Hadoop, then you may want to centralize the Oozie settings in the
Metadata folder in the Repository tree view.
Prerequisites:
-
Launch the Hadoop distribution you need to use and ensure that you have the proper
access permission to that distribution and its Oozie.
-
Create the connection to that Hadoop distribution from the Hadoop cluster node. For further information, see Centralizing a Hadoop connection.
The Oozie scheduler is used to schedule executions of a Job,
deploy and run Jobs in HDFS and monitor the executions. To create an Oozie connection,
proceed as follows:
Procedure
-
Expand the Hadoop cluster node under Metadata in the Repository tree view, right-click the Hadoop connection to be used
and select Create Oozie from the contextual
menu.
-
In the connection wizard that opens up, fill in the generic properties of the
connection you need create, such as Name, Purpose and Description.
The Status field is a customized field you can
define in File >Edit project properties.
-
Click Next when completed. The second step
requires you to fill in the Oozie connection data. In the End
Point field, the URL of the Oozie web application is automatically
constructed with the host name of the NameNode of the Hadoop connection you are
using and a default Oozie port number. This web application also allows you to
consult the status of the scheduled Job executions in the Oozie Web Console in your web browser.
If the Hadoop distribution you select enables the Kerberos security, the User name field becomes deactivated.
You can still modify this Oozie URL if necessary.
-
If you need to use custom configuration for the Hadoop distribution to be used,
click the [...] button next to Hadoop properties to
open the corresponding properties table and add the property or properties to be
customized. Then at runtime, these changes will override the corresponding default
properties used by the Studio for its Hadoop engine.
Note a Parent Hadoop properties table is
displayed above the current properties table you are editing. This parent table is
read-only and lists the Hadoop properties that have been defined in the wizard of
the parent Hadoop connection on which the current Oozie connection is based.
-
In the User name field, enter the login user name
for Oozie, or leave this field empty to use the anonymous access in which the user
name of the client machine is used.
-
Click Check to verify the connection.
A message pops up to indicate whether the connection is successful.
-
Click Finish to validate these changes.
The created Oozie connection is now available under the Hadoop cluster node in the Repository tree view.
Note:
This Repository view may vary depending the
edition of the Studio you are using.
Results
Then when you configure the Oozie scheduler for a Job in
the Oozie scheduler view, you can reuse the centralized
Oozie settings.
For further information about how to use Oozie scheduler
for a Job, see Running a Job via Oozie (Deprecated).
If you need to use an environmental context to define the parameters of this connection,
click the
Export as context button to open the
corresponding wizard and make the choice from the following options:
-
Create a new repository context: create this
environmental context out of the current Hadoop connection, that is to say, the
parameters to be set in the wizard are taken as context variables with the values
you have given to these parameters.
-
Reuse an existing repository context: use the
variables of a given environmental context to configure the current
connection.
If you need to cancel the implementation of the context, click
Revert context. Then the values of the context variables being used
are directly put in this wizard.
For a step-by-step example about how to use this Export as
context feature, see Exporting metadata as context and reusing context parameters to set up a connection.