The Parallelization tab - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data

Big Data Platform

Cloud API Services Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

The Parallelization tab is available as one of the settings tab you can use to configure a Row connection.

Parallelization view.

You define the parallelization properties on your row connections according to the following table.

Field/Option Description
Partition row Select this option when you need to partition the input records into a specific number of threads.
Note:

It is not available to the last row connection of the flow.

Departition row Select this option when you need to regroup the outputs of the processed parallel threads.
Note:

It is not available to the first row connection of the flow.

Repartition row Select this option when you need to partition the input threads into a specific number of threads and regroup the outputs of the processed parallel threads.
Note:

It is not available to the first or the last row connection of the flow.

None Default option. Select this option when you do not want to take any action on the input records.
Merge sort partitions Select this check box to implement the Mergesort algorithm to ensure the consistency of data.

This check box appears when you select the Departition row or Repartition row option.

Number of Child Threads Type in the number of threads into which you want to split the input records.

This field appears when you select the Partition row or Departition row option.

Buffer Size Type in the number of rows to cache for each of the threads generated.

This field does not appear if you select the None option.

Use a key hash for partitions Select this check box to use the hash mode for dispatching the input records, which will ensure the records meeting the same criteria are dispatched to the same threads. Otherwise, the dispatch mode is Round-robin.

This check box appears if you select the Partition row or Repartition row option.

In the Key Columns table that appears after you select the check box, set the columns on which you want to use the hash mode.