Rules with the T-Swoosh algorithm - Cloud

Talend Cloud Data Management Platform Studio User Guide

Version
Cloud
Language
English (United States)
Product
Talend Cloud
Module
Talend Management Console
Talend Studio
Content
Design and Development

You can use the T-Swoosh algorithm to find duplicates and to define how two similar records are merged to create a master record, using a survivorship function. These new merged records are used to find new duplicates.

The differences between the T-Swoosh and the VSR algorithms are the following:
  • When using the T-Swoosh algorithm, the master record is in general a new record that does not exist in the list of input records.
  • When using the T-Swoosh algorithm, you can define a survivorship function for each column to create a master record.