Executing the Job - 7.0

Fuzzy matching

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
EnrichPlatform
Talend Studio

Procedure

Save your Job and click F6 to execute it.

Results

tFuzzyUniqRow uses the Levenshtein method to compare each of the three defined columns separately, it uses the Double Metaphone method to compare data in the City column, and finally passes the unique and duplicate rows to the defined output files. In our example, the first two rows match, hence the second row will go in the "duplicates" output.

The generated FID column gives a reference identifier of the original record which the current record refers to.

The third row is unique and will go in the "uniques" output.

The generated UID column is an identifier generated for the main record.