Matching measures - Cloud - 8.0

Data matching with Talend tools

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06
To compare one attribute of two records, you can use any of the implemented matching functions, such as Exact, Levenshtein and Jaro-Winkler, or a custom matching algorithm you created.

You can also compare two records on many attributes. For two records to match, the following two conditions must hold:

  • When using the T-Swoosh algorithm, the score for each matching function in the match rule must exceed the threshold, if any specified. By default, the threshold is set to 1. This means exact match for most matching functions, excepted for Exact - ignore case and potentially any custom matching function.
  • The global score, computed as a weighted score of the different matching functions, must exceed the match threshold. The score is equal to Σ(wi × si(r1,r2)) / Σwi where wi is the confidence weight of the matching function i and si(r1,r2) is the score of the matching function i over records r1 and r2 .

In this example, the score for the Jaro-Winkler metric on the fname attribute must exceed 0.7 and the global score, with a confidence weight of 1 on each of the two attributes, must exceed 0.85.

This example shows the weighted average computation that yields to the global score of two similar records:
  1. As the Confidence Weight of both attributes is set to 1, the normalized weight of each attribute is 0.5.
  2. The attribute matching distance is 1 for the lname attribute and 0.722... for the fname attribute.
  3. The score is calculated as follows: 0.5 x 1 + 0.5 x 0.722... = 0.8611...