Rules with the T-Swoosh algorithm - 7.1

Talend Data Management Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Data Management Platform
task
Design and Development
EnrichPlatform
Talend Studio

You can use the T-Swoosh algorithm to find duplicates and to define how two similar records are merged to create a master record, using a survivorship function. These new merged records are used to find new duplicates.

The differences between the T-Swoosh and the VSR algorithms are the following:
  • When using the T-Swoosh algorithm, the master record is in general a new record that does not exist in the list of input records.
  • When using the T-Swoosh algorithm, you can define a survivorship function for each column to create a master record.