Creating a synonym index for people names using tMap - 7.3

Synonym index

Version
7.3
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Standardization components > Synonym index components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components > Synonym index components
Design and Development > Third-party systems > Data Quality components > Standardization components > Synonym index components
Last publication date
2023-06-12

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

For more technologies supported by Talend, see Talend components.

In this scenario, a four-component Job creates an index storing people names and their relative nicknames.

The source data to be used in this scenario is stored in a .csv file, an extract of which is shown below:

Country;FirstName;Nickname1;Nickname2;Nickname3;Nickname4
France;Anne;Ninon;Annie;Ninette;Ann
France;Bernadette;Nad;Netty;Dadette
France;Albert;Al
France;Alexandre;Alex
France;Alfred-Hubert;Alu
France;Andrew;Andy
France;Anthony;Anton;Tony;Tonio
France;Artus;Artie
France;Benoit;Ben
France;Catherine;Cate;Katherine;Kathryn
France;Charles;Charlie;Charlot;Chuck
France;Christophe;Christian;Chris;Kris;Kristof
France;Christian;Chris

This data describes people's home country (not to be inserted into the index), first names (reference entries) and frequently used nicknames (synonyms).

The four components used in this Job are:

  • tFileInputDelimited: this component reads the source data and inputs them to tSynonymOutput.

  • tMap: this component is used to transform the source data into two separated columns representing the first names and the nicknames, in the meantime, ignoring the people's home country information.

  • tSynonymOutput: this component creates the index of interest in this scenario and feeds it with the synonyms given in the source file.

  • tLogRow: this component lists the data that have been inserted into the newly created index.