Soundex frequency statistics - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-13
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

Indicators in this group use the Soundex algorithm built in the DBMS.

Those indicators index records by sounds. This way, records with the same English pronunciation are encoded to the same representation so that they can be matched despite minor differences in spelling.

  • Soundex Frequency: computes the number of most frequent distinct records relative to the total number of records having the same pronunciation.
  • Soundex Low Frequency: computes the number of less frequent distinct records relative to the total number of records having the same pronunciation.

To be able to use Soundex frequency statistics indicators on PostgreSQL, Amazon for PostgreSQL and Amazon Redshift, install an extension into the PostgreSQL database using the CREATE EXTENSION fuzzystrmatch; query.

For more information, see PostgreSQL documentation.

To be able to use Soundex frequency statistics indicators on Amazon Redshift, you can also create a custom user-defined function.

For more information, see Creating user-defined functions.

You can only use Soundex frequency statistics indicators on Snowflake with the Java engine.

Chinese characters are only supported by the SQL engine.

Due to some limitation in Teradata soundex implementation, you may not be able to drill down the results of profiling Teradata with this indicator.

The following table shows the indicators that you can select in any database:

Indicator Supported data types with the Java analysis engine Supported data types with the SQL analysis engine
Soundex Frequency Table Text Text
Soundex Low Frequency Table Text Text