Editing rules and displaying sample results - Cloud - 7.3

Talend Studio User Guide

Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Studio
Design and Development
Last publication date
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform


  1. To define a second match rule, put your cursor on the top right corner of the Matching Key table, click the [+] button to create a new rule.
    Follow the steps outlined in Defining a match rule to define matching keys.
    When you define multiple conditions in the match rule editor, an OR match operation is conducted on the analyzed data. Records are evaluated against the first rule and the records that match are not evaluated against the second rule and so on.
  2. Click the button at the top right corner of the Matching Key or Match and Survivor section and replace the default name of the rule with a name of your choice.
    If you define more than one rule in the match analysis, you can use the up and down arrows in the dialog box to change the rule order and thus decide what rule to execute first.
  3. Click OK.
    The rules are named and ordered accordingly in the section.
  4. In the Match threshold field, enter the match probability threshold.
    Two data records match when the probability is above this value.
    In the Confident match threshold field, set a numerical value between the current Match threshold and 1.
    If the GRP-QUALITY calculated by the match analysis is equal to or greater than the Confident match threshold, you can be confident about the quality of the group.
  5. Click Chart to compute the groups according to the blocking key and match rule you defined in the editor and display the results of the sample data in a chart.
    This chart shows a global picture about the duplicates in the analyzed data. The Hide groups less than parameter is set to 2 by default. This parameter enables you to decide what groups to show in the chart, you usually want to hide groups of small group size.
    The chart in the above image indicates that out of the 1000 sample records you examined and after excluding items that are unique, by having the Hide groups less than parameter set to 2:
    • 49 groups have 2 items each. In each group, the 2 items are duplicates of each other.

    • 7 groups have 3 duplicate items and the last group has 4 duplicate items.

    Also, the Data table indicates the match details of items in each group and colors the groups in accordance with their colors in the match chart.