Configuring the components - 6.5

Synonym index

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Standardization components > Synonym index components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components > Synonym index components
Design and Development > Third-party systems > Data Quality components > Standardization components > Synonym index components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click tFixedFlowInput to open its Basic settings view.
  2. Next to the Schema field, click the Edit schema button to open the [Schema] dialog box, add one column and name it FIRSTNAME. When done, click OK to validate these changes and close the dialog box.
  3. In the Mode area, select the Use Inline Content (delimited file) option, and supply the following names in the Content field:
    Kristof
    Chris
    Tony
    Anton
  4. Double-click tSynonymSearch to open its Basic settings view.
  5. Click Sync columns to add the schema columns of its preceding component to the default schema columns of tSynonymSearch.
    When prompted, click Yes to propagate the changes to the next component.
  6. Click the [...] button next to Edit schema to open the [Schema] dialog box, and add one column to the output schema: matched_fname.
    This column will hold the matched reference entries in the output flow.
    When done, click OK to validate the setting and accept propagating the changes when prompted.
  7. In the Limit of each group field, type in 5 to replace the default value.
  8. Under the Columns to search table, click the [+] button to add one row and define the parameters as follows:
    • In the Input column column, select FIRSTNAME from the list of the input columns.

    • In the Reference output column column, select matched_fname from the list of the output columns.

    • In the Index path column, type in the path to the synonym index to be used, between double quotation marks.

    • In the Search mode column, select Match all fuzzy. This will match each word of the input string against similar word of the index string.

    • In the Score threshold column, enter 0.9 to filter results and list only terms with higher similarity.

    • In the Max edits column, select1 to be the allowed edit distance to use.

      With max edit distance 1, you can have only one insertion, deletion or substitution. Any terms within that edit distance from the input data are matched.

    • Leave the Word distance column as it is only for the Match partial mode.

    • In the Limit column, leave the default value 5.

  9. In the Basic settings view of the tLogRow component, select the Table option for better readable display of the Job execution result.