Prepares a text sample and divides it into tokens, which can be words, numbers or punctuation marks.
tNLPPreprocessing outputs a column containing all the tokens for the input text, separated by tabs. You can convert the output to the CoNLL format and manually annotate the text. Then, you can use it to train a model and design features with the tNLPModel component.
This component can run only with Spark 1.6 and 2.0.
For more technologies supported by Talend, see Talend components.