Creating a match analysis - Cloud

Talend Cloud API Services Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development
EnrichPlatform
Talend Management Console
Talend Studio
The match analysis enables you to compare a set of columns in databases or in delimited files and create groups of similar records using blocking and matching keys and/or survivorship rules.

About this task

This analysis enables you to create match rules and test them on data to assess the number of duplicates before using the match rules in the tMatchGroup component, for example. Currently, you can test match rules only on columns in the same table.

Prerequisite(s): You have selected the Profiling perspective of the studio. At least one database or file connection is defined under the Metadata node. For further information, see Connecting to a database.

The sequence of setting up a match analysis involves the following steps:

Procedure

  1. Creating the connection to a data source from inside the editor if no connection has been defined under the Metadata folder in the Studio tree view.
    For further information, see Configuring the match analysis.
  2. Defining the table or the group of columns you want to search for similar records using match processes.
  3. Defining blocking keys to reduce the number of pairs that need to be compared.
    For further information, see Defining a match rule.
  4. Defining match keys, the match methods according to which similar records are grouped together. For further information, see Defining a match rule.
  5. Exporting the match rules from the match analysis editor and centralize them in the studio repository.
    For further information, see Importing or exporting match rules.