Checking the Levenshtein distance of 1 or 2 in first names - Cloud - 8.0

Data matching with Talend tools

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06
This scenario is based on the scenario described above. Only the minimum and maximum distance settings in the tFuzzyMatch component are modified, which will change the output displayed.

Procedure

  1. In the Component view of the tFuzzyMatch, change the minimum distance from 0 to 1. This excludes straight away the exact matches (which would show a distance of 0).
  2. Change also the maximum distance to 2. The output will provide all matching entries showing a discrepancy of 2 characters at most.
    No other changes are required.
  3. Define the Matching item separator field, as several references might be matching the main flow entry.
  4. Save the new Job and press F6 to run it.
    FirstName|Name||
    Brad|Los angeles||
    Jason|New York|2|Jon
    Margaret|||
    Kourtney|Seattle|1|Courtney
    Nicole|Saint-Louis||
    John|Denver|1|Jon
    As the edit distance has been set to 2, some entries of the main flow match more than one reference entry.

Results

You can also use another method, the metaphone, to assess the distance between the main flow and the reference, which will be described in the next scenario.