tNLPPreprocessing - 7.3

Natural Language Processing

Version
7.3
Language
English (United States)
Product
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Natural Language Processing
Data Quality and Preparation > Third-party systems > Natural Language Processing
Design and Development > Third-party systems > Natural Language Processing

Prepares a text sample and divides it into tokens, which can be words, numbers or punctuation marks.

tNLPPreprocessing outputs a column containing all the tokens for the input text, separated by tabs. You can convert the output to the CoNLL format and manually annotate the text. Then, you can use it to train a model and design features with the tNLPModel component.

This component can run only with Spark 1.6 and 2.0.

For more technologies supported by Talend, see Talend components.