Scenario 2: Levenshtein distance of 1 or 2 in first names - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario is based on the scenario described above. Only the minimum and maximum distance settings in the tFuzzyMatch component are modified, which will change the output displayed.

  1. In the Component view of the tFuzzyMatch, change the minimum distance from 0 to 1. This excludes straight away the exact matches (which would show a distance of 0).

  2. Change also the maximum distance to 2. The output will provide all matching entries showing a discrepancy of 2 characters at most.

    No other changes are required.

  3. Make sure the Matching item separator is defined, as several references might be matching the main flow entry.

  4. Save the new Job and press F6 to run it.

    As the edit distance has been set to 2, some entries of the main flow match more than one reference entry.

You can also use another method, the metaphone, to assess the distance between the main flow and the reference, which will be described in the next scenario.