tMatchIndex

Continuous matching

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Data Fabric
Talend Big Data Platform
Talend Real-Time Big Data Platform
task
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
EnrichPlatform
Talend Studio
Talend Data Stewardship

Indexes a clean and deduplicated data set in ElasticSearch for continuous matching purposes.

Before indexing a data set in ElasticSearch using the tMatchIndex component, you must have performed all the matching and deduplicating tasks on this data set:
  • You generated a pairing model and computed pairs of suspect duplicates using tMatchPairing.

  • You labeled a sample of the suspect pairs manually or using Talend Data Stewardship to generate a matching model with tMatchModel.

  • You predicted labels on suspect pairs based on the pairing and matching models using tMatchPredict.

  • You cleaned and deduplicated the data set using tRuleSurvivorship.

Then, you do not need to restart the matching process from scratch when you get new data records having the same schema. You can index the clean data set in ElasticSearch using tMatchIndex for continuous matching purposes.

For more information about tMatchIndexPredict, see tMatchIndexPredict.

This component can run only with Spark 2.0+ and ElasticSearch 5+.

For more technologies supported by Talend, see Talend components.