tNLPPreprocessing

Natural Language Processing

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Real-Time Big Data Platform
Talend Data Fabric
Talend Big Data Platform
task
Data Governance > Third-party systems > Natural Language Processing
Data Quality and Preparation > Third-party systems > Natural Language Processing
Design and Development > Third-party systems > Natural Language Processing
EnrichPlatform
Talend Studio

Prepares a text sample and divides it into tokens, which can be words, numbers or punctuation marks.

tNLPPreprocessing outputs a column containing all the tokens for the input text, separated by tabs. You can convert the output to the CoNLL format and manually annotate the text. Then, you can use it to train a model and design features with the tNLPModel component.

This component can run only with Spark 1.6 and 2.0.

For more technologies supported by Talend, see Talend components.