The Parallelization tab - Cloud - 7.3

Talend Studio User Guide

Version
Cloud
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-13
Available in...

Big Data

Big Data Platform

Cloud API Services Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

The Parallelization tab is available as one of the settings tab you can use to configure a Row connection.

You define the parallelization properties on your row connections according to the following table.

Field/Option

Description

Partition row

Select this option when you need to partition the input records into a specific number of threads.

Note:

It is not available to the last row connection of the flow.

Departition row

Select this option when you need to regroup the outputs of the processed parallel threads.

Note:

It is not available to the first row connection of the flow.

Repartition row

Select this option when you need to partition the input threads into a specific number of threads and regroup the outputs of the processed parallel threads.

Note:

It is not available to the first or the last row connection of the flow.

None

Default option. Select this option when you do not want to take any action on the input records.

Merge sort partitions

Select this check box to implement the Mergesort algorithm to ensure the consistency of data.

This check box appears when you select the Departition row or Repartition row option.

Number of Child Threads

Type in the number of threads into which you want to split the input records.

This field appears when you select the Partition row or Departition row option.

Buffer Size

Type in the number of rows to cache for each of the threads generated.

This field does not appear if you select the None option.

Use a key hash for partitions

Select this check box to use the hash mode for dispatching the input records, which will ensure the records meeting the same criteria are dispatched to the same threads. Otherwise, the dispatch mode is Round-robin.

This check box appears if you select the Partition row or Repartition row option.

In the Key Columns table that appears after you select the check box, set the columns on which you want to use the hash mode.