Two new matching components, which work in Apache Spark framework, have been introduced in Talend Studio, namely tMatchIndex, which allows you to index a clean data set in ElasticSearch, and tMatchIndexPredict, which allows you to match new records against a data set indexed in ElasticSearch.
New natural language processing components, which work in Apache Spark framework, have been introduced in Talend Studio: tNLPPreprocessing which allows you to prepare a text sample, tNLPModel which allows you to create a model for named entity recognition tasks, tCompareColumns which compares the values of two columns and can generate features for your model learning tasks and tNLPPredict which allows you to automatically label text data with your own model or generic models.
Much improvement has been done on the tRuleSurvivorship component especially to manage conflict rules.
Much improvement has been done on the tPatternCheck, tPatternExtract and tMultiPatternCheck components.
Users can now customize the schema of the tQASBatchAddressRow component.
The tMatchPairing and tMatchModel components directly connect to Talend Data Stewardship. The labelling task can now be done with Talend Data Stewardship.