Scenario 2: Creating a synonym index for people names using tMap - 6.1

Talend Components Reference Guide

Version
6.1
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance
Data Quality and Preparation
Design and Development

In this scenario, a four-component Job creates an index storing people names and their relative nicknames.

The source data to be used in this scenario is stored in a .csv file, an extract of which is shown below:

Country;FirstName;Nickname1;Nickname2;Nickname3;Nickname4
France;Anne;Ninon;Annie;Ninette;Ann
France;Bernadette;Nad;Netty;Dadette
France;Albert;Al
France;Alexandre;Alex
France;Alfred-Hubert;Alu
France;Andrew;Andy
France;Anthony;Anton;Tony;Tonio
France;Artus;Artie
France;Benoit;Ben
France;Catherine;Cate;Katherine;Kathryn
France;Charles;Charlie;Charlot;Chuck
France;Christophe;Christian;Chris;Kris;Kristof
France;Christian;Chris

This data describes people's home country (not to be inserted into the index), first names (reference entries) and frequently used nicknames (synonyms).

The four components used in this Job are:

  • tFileInputDelimited: this component reads the source data and inputs them to tSynonymOutput.

  • tMap: this component is used to transform the source data into two separated columns representing the first names and the nicknames, in the meantime, ignoring the people's home country information.

  • tSynonymOutput: this component creates the index of interest in this scenario and feeds it with the synonyms given in the source file.

  • tLogRow: this component lists the data that have been inserted into the newly created index.

Setting up the Job

To replicate this scenario, proceed as follows:

  1. Drop tFileInputDelimited, tMap, tSynonymOutput and tLogRow from the Palette onto the design workspace.

    You can change the displayed name of each of these component. For further information, see Talend Studio User Guide.

  2. Right-click the tFileInputDelimited component to open the contextual menu, and select Row > Main to connect it with the tMap component.

  3. Do the same thing to connect tMap to tSynonymOutput using Row > Main link.

    A dialog box pops up to prompt you to name this link you are creating.

  4. Type in synonyms, for example, then click OK to validate this name and thus close this dialog box.

  5. Continue to connect tSynonymOutput to tLogRow using Row > Main link again.

Configuring the components

Configure the data input

  1. Double-click tFileInputDelimited to open its Component view.

  2. In the File name/Stream field, specify the path to the input file.

  3. Click the [...] button next to Edit schema to open the [Schema] dialog box, click the [+] button to add six columns and name them Country, FirstName, Nickname1, Nickname2, Nickname3 and Nickname4 corresponding to the input file structure.

    When done, click OK to close the dialog box and propagate the schema setting to the next component.

    You can also add this tFileInputDelimited file using the established metadata stored in the Repository. This allows you to use automatically the configuration of the corresponding metadata. For further information about how to create and use this metadata, see Talend Studio User Guide.

Configure data structure transformation

  1. Double-click tMap to open the map editor.

  2. At the bottom right corner (synonyms) of the Schema editor view, click the [+] button to add two rows and name them FirstName and Nicknames. These two columns appear in the synonyms table on the right side of the map editor.

  3. On the input side (left) of the upper part, select the FirstName column and drop it to the FirstName column on the output side (right).

  4. In the Expression field of the Nicknames column on the output side (right), type in DqStringHandling.safeConcat('|',).

  5. On the input side (left) of the upper part, select sequentially the columns from Nickname1 to Nickname4 and drop them to the Nicknames columns, and edit the expression in the Expression field so that it reads DqStringHandling.safeConcat('|', row1.Nickname1, row1.Nickname2, row1.Nickname3, row1.Nickname4).

  6. Click OK to validate these changes and accept the propagation prompted by the dialog box that pops up.

Configure index creation and console output

  1. Double-click tSynonymOutput to open its Basic settings view.

  2. In the Index path field, type in or browse to the location where you need to create the index.

  3. In the Operation field, select the operation you need to perform on this created index as well as the related synonyms. In this example, select (Delete and ) initialize an index.

  4. In the Entry field, select the column to be used to receive and store the reference entries. In this scenario, the FirstName column is holding the reference entries, so select FirstName.

  5. In the Synonyms field, select the column to be used to receive and store the synonyms. In this scenario, select Nicknames.

  6. In the Basic settings view of the tLogRow component, select the Table option for better readable display of the Job execution result.

Executing the Job

  • Press F6 to run this Job.

    The index is created and you can view its contents and the entry status on the Console.