Creating a Job to divide the input text into tokens in CoNLL format - 7.3

Natural Language Processing

Version
7.3
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Natural Language Processing
Data Quality and Preparation > Third-party systems > Natural Language Processing
Design and Development > Third-party systems > Natural Language Processing
This Job uses tNLPPreprocessing to divide a text sample in XML format into tokens. Then, tokens are converted to the CoNLL format using tNormalize.

Procedure

  1. Drop the following components from the Palette onto the design workspace: tXMLFileInput, tNLPPreprocessing, tFilterColumns, tNormalize and tFileOutputDelimited.
  2. Connect the components using Row > Main connections.

Results