For more technologies supported by Talend, see Talend components.
In this example, tMatchPairing uses a blocking key to compute the pairs of suspect duplicates in a list of early childhood education centers in Chicago.
The use case described here uses:
a tFileInputDelimited component to read the source file, which contains a list of early childhood education centers in Chicago coming from ten different sources;
a tMatchPairing component to pre-analyze the data, compute pairs of suspect duplicates and generate a pairing model which is used by the tMatchPredict component;
three tFileOutputDelimited components to output the suspect duplicates, a sample of suspect pairs and the unique records; and
a tLogRow component to output the exact duplicates.