In this scenario, a three-component Job creates an index of the standardized city names that provides references to the city synonyms used in the client data of an enterprise.
To create this index, you need a source file to provide the city names and their corresponding synonyms. In this scenario, this is a .csv file and reads as follows:
CityName;Synonyms North Reading;Redding|North Reading|N. Reading|N Reading|N Redding|NR Young America;YA|Young America Dedham;Dedham|dedham|deadham New York;NY|New York
Two columns are found in this file:
the left one is the CityName column which holds the standard city names as reference data.
the right one is the Synonyms column which holds various synonyms collected across the client data of this enterprise.
The three components used in this Job are:
tFileInputDelimited: this component loads data from the source file and inputs them to tSynonymOutput.
tSynonymOutput: this component creates the index of interest in this scenario and feed it with the synonyms given in the source file.
tLogRow: this component lists the data that have been inserted into the newly created index.
To replicate this scenario, proceed as follows:
Drop tFileInputDelimited, tSynonymOutput and tLogRow from the Palette onto the design workspace.
You can change the displayed name of each of these component as what has been done for the tFileInputDelimited component, which appears as CityNames in this scenario. For further information, see Talend Studio User Guide.
Right-click the tFileInputDelimited (CityNames) component to open the contextual menu.
From this menu, select Row > Main.
Click the tSynonymOutput component to create an connection between these two components.
Do the same thing to connect tSynonymOutput to tLogRow.
Double click tFileInputDelimited (CityNames) to open its Basic settings view.
In the File name/Stream field, specify the path to the input file.
Click the [...] button next to Edit schema to open the [Schema] dialog box, click the [+] button twice to add two columns, and name them respectively CityName and Synonyms corresponding to the input file structure.
When done, click OK to close the dialog box and propagate the schema setting to the next component.
You can also add this tFileInputDelimited file using the established metadata stored in the Repository. This allows you to use automatically the configuration of the corresponding metadata. For further information about how to create and use this metadata, see Talend Studio User Guide.
Double-click tSynonymOutput to open its Basic settings view.
In the Index path field, type in or browse to the location where you need to create the index.
In the Operation field, select the operation you need to perform on this created index as well as the related synonyms. In this example, select (Delete and) initialize an index.
In the Entry field, select the column to be used to receive and store the standard reference data. In the source file used in this scenario, the CityName column is holding the standard city names, so select CityName.
In the Synonyms field, select the column to be used to receive and store the synonyms. In this scenario, select Synonyms.
In the Basic settings view of the tLogRow component, select the Table option for better readable display of the Job execution result.