Skip to main content

Comparing four columns using different matching methods and collecting encountered duplicates

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real-Time Big Data Platform, Talend MDM Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

This scenario describes a four-component Job aiming at collecting in two separate files all unique entries and all duplicate entries from few defined processed columns based on the Levenshtein and Double Metaphone matching types.

The input file in this example looks like the following:

ID;Status;FirstName;Email;City;Initial;ZipCode
1;married;Paul;pnewman@comp.com;New York;P.N.;55677
2;single;Raul;rnewman@comp.com;New Ork;R.N.;55677
3;single;Mary;mnewman@comp.com;Chicago;M.N;66898

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!