Creating a match rule - 7.1

Talend Open Studio for Data Quality User Guide

Talend Documentation Team
Talend Open Studio for Data Quality
Design and Development
Talend Studio
In data quality, match rules are used to compare a set of columns and create groups of similar records using blocking and matching keys and/or survivorship functions.

From the studio, you can create match rules with the VSR or the T-Swoosh algorithm and save them in the studio repository. Once centralized in the repository, you can import them in the match analysis editor and test them on your data to group duplicate records. For further information about the match analysis, see Creating a match analysis.

The two algorithms produce different match results because of two reasons:
  • first, the master record is simply selected to be the first input record with the VSR algorithm. Therefore, the list of match groups may depend on the order of the input records,

  • second, the output records do not change with the VSR algorithm, whereas the T-Swoosh algorithm creates new records.