This simple scenario illustrates a Job that normalizes a list of tags for Web forum topics, and displays the result in a table on the Run console.
This list is not well organized and it contains trailing empty strings, leading and trailing whitespace, and repeated tags, as shown below.
ldap, db2, jdbc driver, grid computing, talend architecture , content, environment,, tmap,, eclipse, database,java,postgresql, tmap, database,java,sybase, deployment,, repository, database,informix,java
Drop the following components from the Palette to the design workspace: tFileInputDelimited, tNormalize, tLogRow.
Connect the components using Row > Main connections.
Double-click the tFileInputDelimited component to open its Basic settings view.
In the File name field, specify the path to the input file to be normalized.
Click the [...] button next to Edit schema to open the [Schema] dialog box, and set up the input schema by adding one column named Tags. When done, click OK to validate your schema setup and close the dialog box, leaving the rest of the settings as they are.
Double-click the tNormalize component to open Basic settings view.
Check the schema, and if necessary, click Sync columns to get the schema synchronized with the input component.
Define the column the normalization operation is based on.
In this use case, the input schema has only one column, Tags, so just accept the default setting.
In the Advanced settings view, select the Get rid of duplicate rows from output, Discard the trailing empty strings, and Trim resulting values check boxes.
In the tLogRow component, select the Print values in the cells of table radio button.