Analyzing duplicates - Cloud

Talend Cloud Data Management Platform Studio User Guide

Version
Cloud
Language
English (United States)
Product
Talend Cloud
Module
Talend Management Console
Talend Studio
Content
Design and Development

You can use the match analysis in the Profiling perspective of Talend Studio to compare columns in databases or delimited files and create groups of similar records using the VSR or the T-Swoosh algorithm.

This analysis provides you with a simple way to create match rules, test them on a set of columns and see the results directly in the editor. After testing your match rules on data, you can export them from the editor and save them in the studio repository to be imported and used later in the matching components including tMatchGroup, tRecordMatching, tGenkey and the Hadoop matching components for example.

You can also use the Profiling perspective to define match rules in a match rule editor and save them in the Talend Studio repository.