Skip to main content

Computing suspect pairs and suspect sample from source data

This scenario applies only to Talend Platform products with Big Data and Talend Data Fabric.

In this example, tMatchPairing uses a blocking key to compute the pairs of suspect duplicates in a list of early childhood education centers in Chicago.

The use case described here uses:

  • a tFileInputDelimited component to read the source file, which contains a list of early childhood education centers in Chicago coming from ten different sources;

  • a tMatchPairing component to pre-analyze the data, compute pairs of suspect duplicates and generate a pairing model which is used by the tMatchPredict component;

  • three tFileOutputDelimited components to output the suspect duplicates, a sample of suspect pairs and the unique records; and

  • a tLogRow component to output the exact duplicates.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!