tMatchGroup

tMatchGroup - 7.3

Data matching with Talend tools

Version

7.3

Language

English

Product

Talend Big Data Platform

Talend Data Fabric

Talend Data Management Platform

Talend Data Services Platform

Talend MDM Platform

Talend Real-Time Big Data Platform

Module

Talend Studio

Content

Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components

Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components

Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components

Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components

Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components

Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components

Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components

Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components

Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components

Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components

Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components

Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components

Last publication date

2024-02-06

Creates groups of similar data records in any source data including large volumes of data by using one or several match rules.

tMatchGroup compares columns in both standard input data flows and in M/R input data flows by using matching methods and groups similar encountered duplicates together.

Several tMatchGroup components can be used sequentially to match data against different blocking keys. This will refine the groups received by each of the tMatchGroup components through creating different data partitions that overlap previous data blocks and so on.

In defining a group, the first processed record of each group is the master record of the group. The other records are computed as to their distances from the master records and then are distributed to the due master record accordingly.

Depending on the Talend product you are using, this component can be used in one, some or all of the following Job frameworks:

Standard: see tMatchGroup Standard properties.

The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, and in Talend Data Fabric.
MapReduce: see tMatchGroup MapReduce properties (deprecated).

The component in this framework is available in all Talend Platform products with Big Data and in Talend Data Fabric.