Configuration view - 7.0

Data matching

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
EnrichPlatform
Talend Studio

From this view, you can edit the configuration of the tMatchGroup component or define different configurations in which to execute the Job.

You can use these different configurations for testing purposes for example, but you can only save one configuration from the wizard, the open configuration.

In each configuration, you can define the parameters to generate match rules with the VSR or the T-Swoosh algorithm. The settings of the Configuration view differ slightly depending if you select Simple VSR or T-Swoosh in the basic settings of the tMatchGroup component.

You can define survivorship rules, blocking key(s) and multiple conditions using several match rules. You can also set different match intervals for each rule. The match results on multiple conditions will list data records that meet any of the defined rules. When a configuration has multiple conditions, the Job conducts an OR match operation. It evaluates data records against the first rule and the records that match are not evaluated against the other rules.

The parameters required to edit or create a match rule are:
  • The Key definition parameters.

  • The Match Threshold field.

  • A blocking key in the Blocking Selection table (available only for rules with the VSR algorithm).

    Defining a blocking key is not mandatory but advisable as it partitions data in blocks to reduce the number of records that need to be examined. For further information about the blocking key, see Importing match rules from the studio repository.

  • The Survivorship Rules for Columns parameters (available only for rules with the T-Swoosh algorithm).

  • The Default Survivorship Rules parameters for data types (available only for rules with the T-Swoosh algorithm).

Procedure

  1. In the basic settings of the tMatchGroup component, select Simple VSR from the Matching Algorithm list.
    It is important to have the same type of the matching algorithm selected in the basic settings of the component and defined in the configuration wizard. Otherwise the Job runs with default values for the parameters which are not compatible between the two algorithms.
  2. In the basic settings of the tMatchGroup component, click Preview to open the configuration wizard.
  3. Click the [+] button on the top right corner of the Configuration view.
    This creates, in a new tab, an exact copy of the last configuration.
  4. Edit or set the parameters for the new configuration in the Key definition and Blocking Selection tables.
  5. If needed, define several match rules for the open configuration as the following:
    1. Click the [+] button on the match rule bar to create an exact copy of the last rule in a new tab.
    2. Set the parameters for the new rule in the Key definition table and define its match interval.
    3. Follow the steps above to create as many match rules for a configuration as needed. You can define a different match interval for each rule.
    When a configuration has multiple conditions, the Job conducts an OR match operation. It evaluates data records against the first rule and the records that match are not evaluated against the second rule and so on.
  6. Click the Chart button at the top right corner of the wizard to execute the Job in the open configuration.
    The matching results are displayed in the matching chart and table.
    Follow the steps above to create as many new configuration in the wizard as needed.
  7. To execute the Job in a specific configuration, open the configuration in the wizard and click the Chart button.
    The matching results are displayed in the matching chart and table.
  8. At the bottom right corner of the wizard, click either:
    • OK to save the open configuration.

      You can save only one configuration in the wizard.

    • Cancel to close the wizard and keep the configuration saved initially in the wizard.

Results

For an example of a match rule with the T-Swoosh algorithm, see Using survivorship functions to merge two records and create a master record.