Creating a match rule - Cloud - 8.0

Talend Studio User Guide

Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Studio
Design and Development
Last publication date
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

In data quality, match rules are used to compare a set of columns and create groups of similar records using blocking and matching keys and/or survivorship functions.

From the Profiling perspective, you can create match rules with the VSR or the T-Swoosh algorithm and save them in the Talend Studio repository. Once centralized in the repository, you can import them in the match analysis editor and test them on your data to group duplicate records. For further information about the match analysis, see Creating a match analysis above.

You can also import rules defined with the VSR algorithm in the tMatchGroup configuration wizard and in other match components, including the Hadoop components, and use the rules in match Jobs. For further information, see the tMatchGroup documentation.

The two algorithms produce different match results because of two reasons:
  1. The master record is simply selected to be the first input record with the VSR algorithm. Therefore, the list of match groups may depend on the order of the input records,

  2. The output records do not change with the VSR algorithm, whereas the T-Swoosh algorithm creates new records.