This scenario applies only to a subscription-based Talend Platform solution with Big data or Talend Data Fabric.
For more technologies supported by Talend, see Talend components.
In this Job, the tMatchIndex component creates an index in Elasticsearch and populates it with a clean and deduplicated data set which contains a list of education centers in Chicago.
After performing all the matching actions on the data set which contains a list of education centers in Chicago, you do not need to restart the matching process from scratch when you get new data records having the same schema. You can index the clean data set in Elasticsearch using tMatchIndex for continuous matching purposes.
You generated a pairing model using tMatchPairing.
Make sure the input data you want to index is clean and deduplicated.
For an example of how to clean and deduplicate a data set, see Scenario: Creating a clean data set from the suspect pairs labeled by tMatchPredict and the unique rows computed by tMatchPairing.
The Elasticsearch cluster must be running Elasticsearch 5+.