Rules with the T-Swoosh algorithm - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

You can use the T-Swoosh algorithm to find duplicates and to define how two similar records are merged to create a master record, using a survivorship function. These new merged records are used to find new duplicates.

The differences between the T-Swoosh and the VSR algorithms are the following. When using the T-Swoosh algorithm:
  • The master record is in general a new record that does not exist in the list of input records.
  • You can define a survivorship function for each column to create a master record.