The Simple VSR Matcher algorithm - Cloud - 8.0

Data matching with Talend tools

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06
The Simple VSR Matcher algorithm compares each record within same block with the previous master records in the lookup table.

If a record does not match any of the previous master records, it is considered as a new master record and added to the lookup table. This means that the first record of the dataset is necessarily a master record. So, the order of the records is important and can have an impact on the creation process of the master records.

When a record matches a master record, the Simple VSR Matcher algorithm does not further attempt to match with other master records because all the master records in the lookup table are not similar. So, once a record matches a master record, the chance of matching another master record is low.

This means a record can only exist in one group of records and be linked to one master record.

For example, take the following set of records as input:

id fullName
1 John Doe
2 Donna Lewis
3 John B. Doe
4 Louis Armstrong

The algorithm processes the input records as follows:

  1. The algorithm takes record 1 and compares it with an empty set of records. Since record 1 does not match any record, it is added to the lookup table.
  2. The algorithm takes record 2 and compares it with record 1. Since it is not a match, record 2 is added to the lookup table.
  3. The algorithm takes record 3 and compares it with record 1 and record 2. Record 3 matches record 1. So, record 3 is added to the group of record 1.
  4. The algorithm takes record 4 and compares it with record 1 and record 2 but not with record 3, which is not a master record. Since it is not a match, record 4 is added to the lookup table.

The output will look like this:

id fullName Grp_ID Grp_Size Master Score GRP_QUALITY
1 John Doe 0 2 true 1.0 0.72
3 John B. Doe 0 0 false 0.72 0
2 Donna Lewis 1 1 true 1.0 1.0
4 Louis Armstrong 2 1 true 1.0 1.0