Finding similar values - Cloud

Talend Cloud Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

If you want to find and filter some text that looks alike, in order to fix typos for example, you can use the Match Similar Text function.

This function creates a new column with the value true if the pattern matches and false if it does not.

Procedure

  1. Select the text column where you want to find similar text.
  2. In the Functions panel, type Match Similar Text and click the result to open the options for the associated function.
  3. Fill in the options according to your needs.

    The Reference field corresponds to some text you enter, and the Fuzziness field corresponds to the number of characters that can be added, removed or different from the Reference. This number is called the Levenshtein distance.

    Note that the Reference field is case sensitive. In this example, the Reference text is new and the Levenshtein distance (Fuzziness) is 1.

    In this example, the function would match words such as "few", "now", "net" or "news", but not "bow", "nap" or "led".

  4. Click the Submit button to apply the function with the selected options.

Results

This creates a new column with the value true if the pattern matches and false if it does not.

For more information on the Levenshtein distance, see https://en.wikipedia.org/wiki/Levenshtein_distance