Data quality components - 6.4

Talend Data Fabric Release Notes

EnrichVersion
6.4
EnrichProdName
Talend Data Fabric
task
Installation and Upgrade
EnrichPlatform
Talend Activity Monitoring Console
Talend Administration Center
Talend Artifact Repository
Talend CommandLine
Talend Data Preparation
Talend Data Stewardship
Talend DQ Portal
Talend ESB
Talend Identity Management
Talend Installer
Talend JobServer
Talend Log Server
Talend MDM Server
Talend MDM Web UI
Talend Project Audit
Talend Repository Manager
Talend Runtime
Talend SAP RFC Server
Talend Studio
  • Two new matching components, which work in Apache Spark framework, have been introduced in Talend Studio, namely tMatchIndex, which allows you to index a clean data set in ElasticSearch, and tMatchIndexPredict, which allows you to match new records against a data set indexed in ElasticSearch.

  • New natural language processing components, which work in Apache Spark framework, have been introduced in Talend Studio: tNLPPreprocessing which allows you to prepare a text sample, tNLPModel which allows you to create a model for named entity recognition tasks, tCompareColumns which compares the values of two columns and can generate features for your model learning tasks and tNLPPredict which allows you to automatically label text data with your own model or generic models.

  • Much improvement has been done on the tRuleSurvivorship component especially to manage conflict rules.

  • Much improvement has been done on the tPatternCheck, tPatternExtract and tMultiPatternCheck components.

  • Users can now customize the schema of the tQASBatchAddressRow component.

  • The tMatchPairing and tMatchModel components directly connect to Talend Data Stewardship. The labelling task can now be done with Talend Data Stewardship.