The differences between the Simple VSR matcher and the T-Swoosh algorithms - 6.5

Using tMatchGroup with the Simple VSR Matcher and T-Swoosh algorithms

author
Talend Documentation Team
EnrichVersion
6.5
task
Data Governance > Third-party systems > Data Quality components > Matching components
Data Quality and Preparation > Matching data
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components
Design and Development > Third-party systems > Data Quality components > Matching components
EnrichPlatform
Talend Studio
When processing input data and comparing a records with a master record, there are differences between the Simple VSR matcher and the T-Swoosh algorithms.

When processing the input data using the Simple VSR matcher algorithm, there may be more iterations than the number of input records because a merged record may be created on each iteration and added to the queue.

When comparing a record with a master record, the T-Swoosh algorithm does more comparisons per iteration than the Simple VSR matcher algorithm:
  • When using the Simple VSR matcher algorithm, the record from the queue is only compared with the value of the master record.

  • When using the T-Swoosh algorithm, the record from the queue is compared with both the value of the master record and the value of each of the records used to build this master record.

    In the example taken from The T-Swoosh algorithm, on iteration 5, the record "John Doe, John B. Doe" is compared with the record "John B. Doe". There is a match if at least one of the three strings "John Doe, John B. Doe", "John Doe", "John B. Doe" matches with the string "Johnnie B. Doe".