Soundex frequency statistics - 7.1

Talend Real-time Big Data Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Design and Development
EnrichPlatform
Talend Studio

Indicators in this group use the Soundex algorithm built in the DBMS.

They index records by sounds. This way, records with the same pronunciation (only English pronunciation) are encoded to the same representation so that they can be matched despite minor differences in spelling.

  • Soundex Frequency: computes the number of most frequent distinct records relative to the total number of records having the same pronunciation.
  • Soundex Low Frequency: computes the number of less frequent distinct records relative to the total number of records having the same pronunciation.
Note: Due to some limitation in Teradata soundex implementation, you may not be able to drill down the results of profiling Teradata with this indicator. For further information, see Teradata error: "Invalid Input: only Latin letters allowed" .