The differences between the Simple VSR Matcher and the T-Swoosh algorithms - Cloud - 8.0

Data matching with Talend tools

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06

When processing the input data using the Simple VSR Matcher algorithm, there may be more iterations than the number of input records because a merged record may be created on each iteration and added to the queue.

This is one of the main differences between the Simple VSR Matcher and the T-Swoosh algorithms.

When comparing a record with a master record, the T-Swoosh algorithm makes more comparisons per iteration than the Simple VSR matcher algorithm:
  • When using the Simple VSR matcher algorithm, the record from the queue is only compared with the value of the master record. There is no comparison between the record from the queue and the value of each of the records used to build this master record. Then, sort the input records so that the most trustworthy records appear first in the input data.
  • When using the T-Swoosh algorithm, the record from the queue is compared with the value of the master record and the value of each of the records used to build this master record, until records are considered a match.

    For an example of how to survive master records using the T-Swoosh algorithm, see The T-Swoosh algorithm.

    In this example, the record "John Doe, John B. Doe" is compared with the record "John B. Doe" on iteration 5. There is a match if at least one of the three strings "John Doe, John B. Doe", "John Doe" and "John B. Doe" matches the string "Johnnie B. Doe".