In the Job Conductor page of Talend Administration Center, you define an
execution task to gather the script generation, deployment and execution phases of your
MapReduce and Spark Batch Jobs.
Before you begin
-
Ensure that the client machine on which the Talend Jobs are executed can
recognize the host names of the nodes of the Hadoop cluster to be used. For this purpose, add
the IP address/hostname mapping entries for the services of that Hadoop cluster in the
hosts file of the client machine.
In this use case, this machine is the one on which Talend Runtime is
installed.
-
The Hadoop cluster to be used has been properly configured and is
running.
-
The administrator of the cluster has given read/write rights and
permissions to the username to be used for the access to the related data
and directories in HDFS.
-
You have created the use case Jobs described in the previous
sections and run them successfully from the Studio.
Procedure
-
Log in to Talend Administration Center with the
account you have created in Setting up your first user and project.
-
In the Menu tree view of your
Talend Administration Center, click
Job Conductor to display the Job conductor page.
-
From the toolbar on the Job
Conductor page, click Add
> Normal Task to clear the Execution task configuration panel.
-
In the Label field, enter the
name you want to give to the task to be triggered. For example,
getting_started.
-
Click the
icon to open a Job filter to search for the Job to be run from Job conductor and select it from the filter using
its Latest version.
Once you have selected the Job, the Project, the Branch,
the Name, the Version and the Context
fields are all automatically filled with the related information of the
selected Job.
-
Select the Regenerate Job on
change check box to regenerate the selected Job before task
deployment and execution every time a modification is made to the Job
itself.
Note that if you selected Latest
version, in case a new version of the Job is created in
Studio, the Job will be regenerated even if you did not select the
Regenerate Job on change check
box.
-
Select the server on which the task should be deployed.
-
Click Save to validate the
configuration.
This new task is added to the task list.
-
In the Job conductor page,
click the getting_started task to select it and on the
toolbar, click Generate to allow the task to
fetch the relevant Job script in the relevant project from the Talend Studio
Repository and generates the code.
Once done, the status of the task changes to Ready to deploy, meaning that the code
generated is now ready to be deployed on the execution server.
-
Click Deploy to deploy the Job
on the execution server.
Once done, the status changes to Ready
to run. This means that the server has received the Job and
is now ready to execute it.
-
Click Run to execute the
Job.
Once done, the status switches back to Ready to run, which means that the Job can be run again if
needed.
In case the task did not complete properly, check the Error Status column as well as the task log for
the Job completion information.
Once done, you can check, for example in the web console of your HDFS
system, that the output has been written in HDFS.