Restartability with Talend Studio and Talend Administration Center
The ability to restart a Job from a point where it failed is called restartability. Whether you are programming in Talend or using any other language, checkpointing is a useful technique to ensure that your Jobs can be restarted and recover from a previous error.
Restartability in Talend can be accomplished using both Talend Studio and Talend Administration Center. In Talend Studio you need the OnSubjobOk and OnSubjobError trigger links, and in Talend Administration Center you need to create tasks in the Job scheduler or the execution plan.
- Record the point of failure
- Resume from the point of failure without re-running successfully executed code
- Execute custom recovery code
- Perform a normal execution of previously non-executed code
You can set checkpoints on one or more OnSubjobOk or OnSubjobError trigger links used to connect components in your Job design. In case of failure during execution, this allows you to resume the execution of your Job from the last checkpoint before the error. Therefore, checkpoints within Job designs can be defined as reference points that can precede or follow a failure point during Job execution.
- Go to the component view of the OnSubjobOk.
Select the check box to the left of Recovery Checkpoint
in the Error recovery tab.
This is you restartability point.
- Give the instructions to follow in case the Job fails.
In this example, the instruction given is restart the job.
- In Talend Administration Center, go to Job Conductor or Execution Plan.
Schedule and run this job.
In the example below, the Job was scheduled in the Job Conductor and failed during the last execution.
Click Real Time Statistics to see the point of
Open the task recovery module on the Task Execution Monitoring page.
The recovery module provides a lot of details, such as the failure instructions provided in the Job design, the point of restart, tFileInputDelimited_2 in this case, and the error that the Job encountered.
Click the launch recovery option to restart the Job.
The Job starts from the checkpoint and completes successfully.
The first image below shows the files at the time of the Job failure, and the second image shows the files after restartability.