What is data matching? - Cloud - 8.0

Data matching with Talend tools

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06
Data matching is the process that enables you to find records representing the same entity in a dataset.

General definition

Data matching enables you to:
  • Find duplicates, potential duplicates and non-duplicates in a data source
  • Analyze data and return weighted probabilities of matching
  • Merge identical or similar entries into a single entry; and
  • Reduce disparity across different data sources.

Record linkage

Record linkage consists of identifying records that refer to the same entity in a dataset.

Two types of data record linkage exist:
  • Deterministic record linkage, which is based on identifiers that match; and
  • Probabilistic record linkage, which is based on the probability that identifiers match.