Working principle - 7.0

Data matching

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
EnrichPlatform
Talend Studio

This component implements the MapReduce model, based on the blocking keys defined in the Blocking definition table of the Basic settings view.

This implementation proceeds as follows:

  1. Splits the input rows in groups of a given size.

  2. Implements a Map Class that creates a map between each key and a list of records.

  3. Shuffles the records to group those with the same key together.

  4. Applies, on each key, the algorithm defined in the Key definition table of the Basic settings view.

    Then accordingly, this component reads the records, compares them with the master records, groups the similar ones, and classes each of the rest as a master record.

  5. Outputs the groups of similar records with their group IDs, group sizes, matching distances and scores.