Executing multiple subJobs in parallel - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-27

The Multi thread execution feature allows you to run multiple subJobs that are active in the workspace in parallel.

Warning: With this feature enabled, the implementation of the globalMap object is synchronized. This can protect your Job from thread safety issues when you are using global variables created via the globalMap object, while it can also increase the chances of performance issues and possible deadlock situations. For more information about the globalMap object, see Using contexts and variables.

As explained in the previous sections, a Job opened in the workspace can contain several subJobs and you are able to arrange their execution order using the trigger links such as OnSubjobOK. However, when the subJobs do not have any dependencies between them, you might want to launch them at the same time. For example, the following image presents four subJobs within a Job and with no dependencies in between.

Design workspace with multiple subJobs.

The tRunJob component is used in this example to call each subJob they represent.

Then with the Job opened in the workspace, you need simply proceed as follows to run the subJobs in parallel:

Procedure

  1. Click the Job tab, then the Extra tab to display it.
  2. Select the Multi thread execution check box to enable the parallel execution.
    This feature is optimal when the number of threads (in general a subJob count one thread) do not exceed the number of processors of the machine you use for parallel executions. Otherwise, some of the subJobs have to wait until any processor is freed up.
  3. If needed, fill the Parallelize Buffer Unit Size field with the number of rows you want to buffer for each of the threads handled in parallel before the data is processed and the buffer is cleaned.
    This setting is meaningful only if the Enable parallel execution check box is selected and the child Jobs or subJobs contain database output components.
    For a use case of using the Multi-thread Execution feature to run Jobs in parallel, see .