Configuring the input component - 7.3

Natural Language Processing

Version
7.3
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Natural Language Processing
Data Quality and Preparation > Third-party systems > Natural Language Processing
Design and Development > Third-party systems > Natural Language Processing
The tFileInputXML component is used to load the text to be processed.

Procedure

  1. Double-click the tFileInputXML component to open its Basic settings view and define its properties.
    1. Click the [...] button next to Edit schema to add the necessary columns to hold the input data.
    2. In the File name field, specify the path to the file to be processed.
    3. In the Element to extract, enter "row".
    4. In the Loop XPath query field, enter the XPath query expression between double quotation marks to specify the node on which the loop is based.
    5. In the XPath query column of the Mapping table, specify the fields to be queried between double quotation marks.
  2. In the Advanced settings view of the component, select the Custom encoding check box if you encounter issues when processing the data.
  3. From the Encoding list, select the encoding to be used, UTF-8 in this example.