In the Job Conductor page of Talend Administration Center, you define an execution task to gather the script generation, deployment and execution phases of your MapReduce and Spark Batch Jobs.
Before you begin
Ensure that the client machine on which the Talend Jobs are executed can recognize the host names of the nodes of the Hadoop cluster to be used. For this purpose, add the IP address/hostname mapping entries for the services of that Hadoop cluster in the hosts file of the client machine.
In this use case, this machine is the one on which Talend Runtime is installed.
The Hadoop cluster to be used has been properly configured and is running.
The administrator of the cluster has given read/write rights and permissions to the username to be used for the access to the related data and directories in HDFS.
You have created the use case Jobs described in the previous sections and run them successfully from the Studio.
- Log in to Talend Administration Center with the account you have created in Setting up your first user and project.
- In the Menu tree view of your Talend Administration Center, click Job Conductor to display the Job conductor page.
- From the toolbar on the Job Conductor page, click Add > Normal Task to clear the Execution task configuration panel.
- In the Label field, enter the name you want to give to the task to be triggered. For example, getting_started.
Click the icon to open a Job filter to search for the Job to be run from Job conductor and select it from the filter using
its Latest version.
For example, it can be the MapReduce Job described in Joining movie and director information using a MapReduce Job.
Once you have selected the Job, the Project, the Branch, the Name, the Version and the Context fields are all automatically filled with the related information of the selected Job.
Select the Regenerate Job on
change check box to regenerate the selected Job before task
deployment and execution every time a modification is made to the Job
Note that if you selected Latest version, in case a new version of the Job is created in Studio, the Job will be regenerated even if you did not select the Regenerate Job on change check box.
Select the server on which the task should be deployed.
In this scenario, the server is the Talend Runtime service you have configured in Connecting Talend Runtime Container to Talend Administration Center.
Click Save to validate the
This new task is added to the task list.
In the Job conductor page,
click the getting_started task to select it and on the
toolbar, click Generate to allow the task to
fetch the relevant Job script in the relevant project from the Talend Studio
Repository and generates the code.
Once done, the status of the task changes to Ready to deploy, meaning that the code generated is now ready to be deployed on the execution server.
Click Deploy to deploy the Job
on the execution server.
Once done, the status changes to Ready to run. This means that the server has received the Job and is now ready to execute it.
Click Run to execute the
Once done, the status switches back to Ready to run, which means that the Job can be run again if needed.
In case the task did not complete properly, check the Error Status column as well as the task log for the Job completion information.
Once done, you can check, for example in the web console of your HDFS system, that the output has been written in HDFS.