Defining a matching key - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

Procedure

  1. In the rule editor and in the Matching Key table, click the [+] button to add a row to the table.
  2. Set the parameters of the matching key as the following:
    • Match Key Name: Enter the name of your choice for the match key.

    • Matching Function: Select the type of matching you want to perform from the drop-down list. Select Custom if you want to use an external user-defined matching algorithm.

      In this example two match keys are defined, you want to use the Levenshtein and Jaro-Winkler match methods on first names and last names respectively and get the duplicate records.

    • Custom Matcher: This item is only used with the Custom matching function. Browse and select the Jar file of the user-defined algorithm.

    • Confidence Weight: Set a numerical weight (between 1 and 10) to the column you want to use as a match key. This value is used to give greater or lesser importance to certain columns when performing the match.

    • Handle Null: Specify how to deal with data records which contain null values.

    For further information about the match rule parameters, see the tMatchGroup documentation.
  3. In the Match threshold field, enter the match probability threshold. Two data records match when the probability is above this value.
    In the Confident match threshold field, set a numerical value between the current Match threshold and 1. Above this threshold, you can be confident about the quality of the group.
  4. To define a second match rule, place your cursor on the top right corner of the Matching Key table and then click the [+] button.
    Follow the steps to create a match rule.
    When you define multiple conditions in the match rule editor, an OR match operation is conducted on the analyzed data. Records are evaluated against the first rule and the records that match are not evaluated against the second rule.
  5. Optional: To replace the default names of the rules, click Edit/Sort Rule Names icon on the top right corner of the table.
    You can also use the up and down arrows in the dialog box to change the rule order and thus decide what rule to execute first.
  6. Click OK.
    The rules are named and ordered accordingly in the Matching Key table.
  7. Save the match rule settings.
    The match rule is saved and centralized under Libraries > Rule > Match in the DQ Repository tree view.