Configuring the input component - Cloud - 8.0

Text standardization

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Standardization components > Text standardization components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components > Text standardization components
Design and Development > Third-party systems > Data Quality components > Standardization components > Text standardization components

Before you begin

You retrieved the tJapaneseTokenize_standard_scenario.zip file.

Procedure

  1. Double-click tFileInputDelimited to open its Basic settings view in the Component tab.
  2. In the File name/Stream field, enter the path to the file containing the input text to be tokenized.
  3. Define the characters to be used as Row Separator and Field Separator.
  4. Define the numbers of rows in the Header and the Footer.
  5. Click the Edit schema button to define the columns of the source dataset and their data type.
  6. Click the [+] button to add the schema columns.

    Example

  7. Click OK to validate these changes and accept the propagation when prompted.
  8. In the Advanced settings tab of the tFileInputDelimited component, select the right encoding from the Encoding list.
    The inputJapaneseText.txt file uses the UTF-8 encoding.