Indexes a clean and deduplicated data set in ElasticSearch for continuous matching purposes.
- You generated a pairing model and computed pairs of suspect duplicates using tMatchPairing.
- You labeled a sample of the suspect pairs manually or using Talend Data Stewardship to generate a matching model with tMatchModel.
- You predicted labels on suspect pairs based on the pairing and matching models using tMatchPredict.
- You cleaned and deduplicated the data set using tRuleSurvivorship.
Then, you do not need to restart the matching process from scratch when you get new data records having the same schema. You can index the clean data set in ElasticSearch using tMatchIndex for continuous matching purposes.
The tMatchIndex component supports Elasticsearch versions up to 6.4.2 and Apache Spark from version 2.0.0.
As this component does not support the Elasticsearch authentication, it cannot run on Databricks.
For more technologies supported by Talend, see Talend components.