Configuring the components - 7.0

Text standardization

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Standardization components > Text standardization components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components > Text standardization components
Design and Development > Third-party systems > Data Quality components > Standardization components > Text standardization components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click the tFileInputDelimited component to open its Basic settings view.
  2. Browse to the input file, and set basic properties based on the structure of the input file. In this example, the input file provides a list of English words in different variant forms, and does not have a header. The following is an exact of the file content.
    computerize
    computerized
    computerizing
    program
    programming
    cooking
    cooked
    cooks
    evaporable
  3. Click the [...] button next to Edit schema to open the [Schema] dialog box, and set the input schema, which should contain one column named Word in this example.
    When done, click OK to close the dialog box.
  4. Double-click the tMap component to open the map editor. We will use this component to map the single-column input flow to a two-column data flow to feed the tStem component.
  5. Click the [+] button to add two columns to the output schema and name them Fullform and Stem respectively. Then, drag the Word column from the input table onto the Fullform column, then onto the Stem column, in the output table.
    When done, click OK to close the map editor and propagate the changes to the next component.
  6. Double-click the tStem component to open its Basic settings view.
  7. In the Select Algorithm table, click in the Algorithm field for the Stem column, which will carry the word stems extracted from the input data, and select English as the algorithm language.
  8. Double-click the tLogRow component to open its Basic settings view, and select the Table option for better readable display of the Job execution result.