Creating a Job to divide the input text into tokens in CoNLL format - 7.0

Natural Language Processing

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Natural Language Processing
Data Quality and Preparation > Third-party systems > Natural Language Processing
Design and Development > Third-party systems > Natural Language Processing
EnrichPlatform
Talend Studio
This Job uses tNLPPreprocessing to divide a text sample in XML format into tokens. Then, tokens are converted to the CoNLL format using tNormalize.

Procedure

  1. Drop the following components from the Palette onto the design workspace: tXMLFileInput, tNLPPreprocessing, tFilterColumns, tNormalize and tFileOutputDelimited.
  2. Connect the components using Row > Main connections.

Results