Scenario 1: Generating stems for a list of English words - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Data Governance
Data Quality and Preparation
Design and Development

This basic scenario describes a four-component Job that reads a list of English words from a one-column delimited file, extracts the stems of the words, and displays both the list of words and the corresponding stems on the Run console.

Setting up the Job

  1. Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tMap, tStem, and tLogRow.

  2. Link the tFileInputDelimited component to the tMap component using a Row > Main connection.

  3. Link the tMap component to the tStem component using a Row > Main connection, and give the output row connection a name, out in this example.

  4. Link the tStem component to the tLogRow component using a Row > Main connection.

Configuring the components

  1. Double-click the tFileInputDelimited component to open its Basic settings view.

  2. Browse to the input file, and set basic properties based on the structure of the input file. In this example, the input file provides a list of English words in different variant forms, and does not have a header. The following is an exact of the file content.

    computerize
    computerized
    computerizing
    program
    programming
    cooking
    cooked
    cooks
    evaporable
  3. Click the [...] button next to Edit schema to open the [Schema] dialog box, and set the input schema, which should contain one column named Word in this example.

    When done, click OK to close the dialog box.

  4. Double-click the tMap component to open the map editor. We will use this component to map the single-column input flow to a two-column data flow to feed the tStem component.

  5. Click the [+] button to add two columns to the output schema and name them Fullform and Stem respectively. Then, drag the Word column from the input table onto the Fullform column, then onto the Stem column, in the output table.

    When done, click OK to close the map editor and propagate the changes to the next component.

  6. Double-click the tStem component to open its Basic settings view.

  7. In the Select Algorithm table, click in the Algorithm field for the Stem column, which will carry the word stems extracted from the input data, and select English as the algorithm language.

  8. Double-click the tLogRow component to open its Basic settings view, and select the Table option for better readable display of the Job execution result.

Executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click the Run button on the Run tab to execute the Job.

    The list of words read from the input data and their corresponding stems are displayed on the Run console.