How to show the match results - Cloud - 7.3

Talend Studio User Guide

Version
Cloud
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2023-09-13
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

About this task

To collect duplicates from the input flow according to the match types you define, Levenshtein and Jaro-Winkler in this example, do the following:

Procedure

  1. When you are processing large data sets, select the Store on disk check box in the Analysis parameter view and:
    • In the Max buffer size field, type in the size of physical memory you want to allocate to processed data.

    • In the Temporary data directory path field, set the path to the directory where you want to store the temporary file.

  2. Save the settings in the match analysis editor and press F6.
    The analysis is executed. The match rule and blocking key are computed against the whole dataset and the Analysis Results view is open in the editor.
    In this view, the charts give a global picture about the duplicates in the analyzed data. In the first tables, you can read statistics about the count of processed records, distinct records with only one occurrence, duplicate records (matched records) and suspect records that did not match the rule. Duplicate records represent the records that matched with a good score - above the confidence threshold. One record of the matched pair is a duplicate that should be discarded and the other is the survivor record.
    In the second table, you can read statistics about the number of groups and the number of records in each group. You can click any column header in the table to sort the results accordingly.