Creating a synonym index for city names - Cloud - 8.0


Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Studio
ジョブデザインと開発 > サードパーティーシステム > データクオリティコンポーネント > 標準化 > 類義語インデックスコンポーネント
データガバナンス > サードパーティーシステム > データクオリティコンポーネント > 標準化 > 類義語インデックスコンポーネント
データクオリティとプレパレーション > サードパーティーシステム > データクオリティコンポーネント > 標準化 > 類義語インデックスコンポーネント
Last publication date

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

For more technologies supported by Talend, see Talendコンポーネント.

In this scenario, a three-component Job creates an index of the standardized city names that provides references to the city synonyms used in the client data of an enterprise.

To create this index, you need a source file to provide the city names and their corresponding synonyms. In this scenario, this is a .csv file and reads as follows:

North Reading;Redding|North Reading|N. Reading|N Reading|N Redding|NR
Young America;YA|Young America
New York;NY|New York

Two columns are found in this file:

  • the left one is the CityName column which holds the standard city names as reference data.

  • the right one is the Synonyms column which holds various synonyms collected across the client data of this enterprise.

The three components used in this Job are:

  • tFileInputDelimited: this component loads data from the source file and inputs them to tSynonymOutput.

  • tSynonymOutput: this component creates the index of interest in this scenario and feed it with the synonyms given in the source file.

  • tLogRow: this component lists the data that have been inserted into the newly created index.