tNLPPreprocessing - 7.0

Natural Language Processing

EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Data Governance > Third-party systems > Natural Language Processing
Data Quality and Preparation > Third-party systems > Natural Language Processing
Design and Development > Third-party systems > Natural Language Processing

Prepares a text sample and divides it into tokens, which can be words, numbers or punctuation marks.

tNLPPreprocessing outputs a column containing all the tokens for the input text, separated by tabs. You can convert the output to the CoNLL format and manually annotate the text. Then, you can use it to train a model and design features with the tNLPModel component.

This component can run only with Spark 1.6 and 2.0.

For more technologies supported by Talend, see Talend components.