How to set checkpoints in the MapReduce Jobs - 6.2

Talend Data Fabric Studio User Guide

EnrichVersion
6.2
EnrichProdName
Talend Data Fabric
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

You can set checkpoints in a MapReduce Job to restart, in case of failure of Job execution, this Job from the last checkpoint previous to the error instead of from the beginning. This feature is typically useful when your Job is huge with multiple execution steps.

The following image presents an example of the checkpoint set on a MapReduce Job in the Studio.

In this example, a checkpoint (visually as a icon) is set up between the two Subjobs and in case of execution error, you can use Talend Administration Center to restart the Job from this checkpoint.

Note that a checkpoint can be placed only on the Trigger link between Subjobs of your Job and this Job must be hosted in a remote project from Talend Administration Center.

To define a checkpoint in a Job containing Subjobs, proceed as follows:

  1. Click the OnSubjobOk link between the Subjobs you want to set the checkpoint for.

    The configuration tab of this link is displayed in the Component view.

  2. Click the Error recovery tab to open its view.

  3. Select the Recovery checkpoint check box and enter the metadata of this checkpoint in the Label and the Failure instructions fields.

    If the Recovery checkpoint check box is grey, that is to say, cannot be selected, check and ensure that the Job you are using is properly hosted in a remote project.

For further information about setting the checkpoints in the Studio, see How to recover Job execution in case of failure.

You need Talend Administration Center to use the checkpoints you have set. For further information, see Talend Administration Center User Guide.