Configuring the components - Cloud - 8.0

Synonym index

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Standardization components > Synonym index components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components > Synonym index components
Design and Development > Third-party systems > Data Quality components > Standardization components > Synonym index components
Last publication date
2024-02-20

Procedure

  1. Double-click tFixedFlowInput to open its Basic settings view.
  2. Next to Edit schema, click the [...] button to open the Schema dialog box, and add a second column LASTNAME next to the FIRSTNAME column you have defined in the previous scenario.
    When done, click OK to validate this change and thus close the dialog box.
  3. In the Content field of the Mode area, add more first name and last name data to make the input data read as follows:Kristof;Toum Chris;Toom Tony;Walker Anton;Correia Jim;Correia Jim;Walker
  4. Double-click tSynonymSearch to open its Basic settings view.
  5. Click Sync columns to synchronize the columns of this component with the preceding one and click Yes to propagate the changes to the next component when prompted.
  6. Click the [...] button next to Edit schema to open the Schema dialog box, and add two columns to the output schema: matched_fname and matched_lname.
    These columns will hold the matched reference entries in the output flow.
    When done, click OK to validate the setting and accept propagating the changes when prompted.
  7. In the Limit of each group field, type in 10 to replace the one you have defined in the previous scenario.
  8. Under the Columns to search table, click the [+] button to add a second row and define the parameters as follows:
    • In the Input column column, select LASTNAME from the drop-down list.

    • In the Reference output column column, select matched_lname from the drop-down list.

    • In the Index path column, type in, between quotation marks, the path to the synonym index holding the last name entries.

      When using Spark Local mode, use a path to a local folder:
      • Apache Spark 3.1 and earlier: prefix://file path or file:///file path.
      • Apache Spark 3.2 and later: file:///file path.
    • In the Search mode column, select Match exact for both input columns. This will match the exact input word against an exact index word.

    • In the Score threshold column, enter 0.9 to filter results and list only terms with higher similarity.

    • Leave the Min similarity and Word distance columns as they are only for the fuzzy modes and the Match partial mode respectively.

    • In the Limit column of this row, leave the default value 5.