For more technologies supported by Talend, see Talend components.
This scenario describes a four-component job aiming at collecting in two separate files all unique entries and all duplicate entries from few defined processed columns based on the Levenshtein and Double Metaphone matching types.
The input file in this example looks like the following:
ID;Status;FirstName;Email;City;Initial;ZipCode 1;married;Paul;firstname.lastname@example.org;New York;P.N.;55677 2;single;Raul;email@example.com;New Ork;R.N.;55677 3;single;Mary;firstname.lastname@example.org;Chicago;M.N;66898