In this scenario, a four-component Job creates an index storing people names and their relative nicknames.
The source data to be used in this scenario is stored in a .csv file, an extract of which is shown below:
Country;FirstName;Nickname1;Nickname2;Nickname3;Nickname4 France;Anne;Ninon;Annie;Ninette;Ann France;Bernadette;Nad;Netty;Dadette France;Albert;Al France;Alexandre;Alex France;Alfred-Hubert;Alu France;Andrew;Andy France;Anthony;Anton;Tony;Tonio France;Artus;Artie France;Benoit;Ben France;Catherine;Cate;Katherine;Kathryn France;Charles;Charlie;Charlot;Chuck France;Christophe;Christian;Chris;Kris;Kristof France;Christian;Chris
This data describes people's home country (not to be inserted into the index), first names (reference entries) and frequently used nicknames (synonyms).
The four components used in this Job are:
tFileInputDelimited: this component reads the source data and inputs them to tSynonymOutput.
tMap: this component is used to transform the source data into two separated columns representing the first names and the nicknames, in the meantime, ignoring the people's home country information.
tSynonymOutput: this component creates the index of interest in this scenario and feeds it with the synonyms given in the source file.
tLogRow: this component lists the data that have been inserted into the newly created index.
To replicate this scenario, proceed as follows:
Drop tFileInputDelimited, tMap, tSynonymOutput and tLogRow from the Palette onto the design workspace.
You can change the displayed name of each of these component. For further information, see Talend Studio User Guide.
Right-click the tFileInputDelimited component to open the contextual menu, and select Row > Main to connect it with the tMap component.
Do the same thing to connect tMap to tSynonymOutput using Row > Main link.
A dialog box pops up to prompt you to name this link you are creating.
Type in synonyms, for example, then click OK to validate this name and thus close this dialog box.
Continue to connect tSynonymOutput to tLogRow using Row > Main link again.
Configure the data input
Double-click tFileInputDelimited to open its Component view.
In the File name/Stream field, specify the path to the input file.
Click the [...] button next to Edit schema to open the [Schema] dialog box, click the [+] button to add six columns and name them Country, FirstName, Nickname1, Nickname2, Nickname3 and Nickname4 corresponding to the input file structure.
When done, click OK to close the dialog box and propagate the schema setting to the next component.
You can also add this tFileInputDelimited file using the established metadata stored in the Repository. This allows you to use automatically the configuration of the corresponding metadata. For further information about how to create and use this metadata, see Talend Studio User Guide.
Configure data structure transformation
Double-click tMap to open the map editor.
At the bottom right corner (synonyms) of the Schema editor view, click the [+] button to add two rows and name them FirstName and Nicknames. These two columns appear in the synonyms table on the right side of the map editor.
On the input side (left) of the upper part, select the FirstName column and drop it to the FirstName column on the output side (right).
In the Expression field of the Nicknames column on the output side (right), type in
On the input side (left) of the upper part, select sequentially the columns from Nickname1 to Nickname4 and drop them to the Nicknames columns, and edit the expression in the Expression field so that it reads
DqStringHandling.safeConcat('|', row1.Nickname1, row1.Nickname2, row1.Nickname3, row1.Nickname4).
Click OK to validate these changes and accept the propagation prompted by the dialog box that pops up.
Configure index creation and console output
Double-click tSynonymOutput to open its Basic settings view.
In the Index path field, type in or browse to the location where you need to create the index.
In the Operation field, select the operation you need to perform on this created index as well as the related synonyms. In this example, select (Delete and ) initialize an index.
In the Entry field, select the column to be used to receive and store the reference entries. In this scenario, the FirstName column is holding the reference entries, so select FirstName.
In the Synonyms field, select the column to be used to receive and store the synonyms. In this scenario, select Nicknames.
In the Basic settings view of the tLogRow component, select the Table option for better readable display of the Job execution result.