Rules with the T-Swoosh algorithm - Cloud

Talend Cloud API Services Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development
EnrichPlatform
Talend Management Console
Talend Studio

You can use the T-Swoosh algorithm to find duplicates and to define how two similar records are merged to create a master record, using a survivorship function. These new merged records are used to find new duplicates.

The differences between the T-Swoosh and the VSR algorithms are the following:
  • When using the T-Swoosh algorithm, the master record is in general a new record that does not exist in the list of input records.
  • When using the T-Swoosh algorithm, you can define a survivorship function for each column to create a master record.