tMatchPairing - Cloud

tMatchPairing - Cloud - 8.0

Data matching with Talend tools

Version

Cloud

8.0

Language

English

Product

Talend Big Data Platform

Talend Data Fabric

Talend Data Management Platform

Talend Data Services Platform

Talend MDM Platform

Talend Real-Time Big Data Platform

Module

Talend Studio

Content

Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components

Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components

Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components

Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components

Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components

Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components

Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components

Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components

Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components

Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components

Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components

Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components

Last publication date

2024-02-06

Enables you to compute pairs of suspect duplicates from any source data including large volumes in the context of machine learning on Spark.

This component reads a data set row by row, excludes unique rows and exact duplicates in separate files, computes pairs of suspect records based on a blocking key definition and creates a sample of suspect records representative of the data set.

You can label suspect pairs manually or load them into a Grouping campaign which is already defined in Talend Data Stewardship.

In local mode, Apache Spark 2.4.0 and later versions are supported.

This component is not shipped with your Talend Studio by default. You need to install it using the Feature Manager. For more information, see Installing features using the Feature Manager.