Text statistics - Cloud

Talend Cloud API Services Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development
EnrichPlatform
Talend Management Console
Talend Studio

You can use the text statistics indicators to analyze columns only if their data mining type is set to nominal in the analysis editor. Otherwise, these statistics are grayed out in the Indicator Selection dialog box. For further information on the available data mining types, see Data mining types.

Text statistics analyze the characteristics of textual fields in the columns, including minimum, maximum and average length.

  • Minimal Length: computes the minimal length of a text field. It excludes null and blank values.
  • Maximal Length: computes the maximal length of a text field. It excludes null and blank values.
  • Average Length: computes the average length of a field. It excludes null and blank values.

Other text indicators are available to count each of the above indicators with null values, with blank values or with null and blank values.

Null values are counted as data of 0 length, that is to say the minimal length of null values is 0. This means that the Other text indicators are available to count each of the above indicators with null values, with blank values or with null and blank values. Minimal Length With Null and the Maximal Length With Null compute the minimal/maximal length of a text field including null values, that are considered to be 0-length text.

Blank values are counted as regular data of 1 length. Empty values are counted as data of 0 length, that is to say the minimal length of blank values is 0. This means that the Minimal Length With Blank and the Maximal Length With Blank compute the minimal/maximal length of a text field including blank values.

The same are applied for all average indicators. Empty values are also counted as data of 0 length.

For example, compute the length of textual fields in a column containing the following values, using all different types of text statistic indicators:

Value Number of characters
"Brayan" 6
"Ava" 3
"_" 1
"" 0
<null> <null>
"__________" 10
Note: "_" represents a space character.
The results are as follows:

The following table shows the indicators that you can select in any database:

Data type Number Text Date Others
Analysis engine type Java SQL Java SQL Java SQL Java SQL
Minimal Length
Minimal Length With Null
Minimal Length With Blank
Minimal Length With Blank And Null
Maximal Length
Maximal Length With Null
Maximal Length With Blank
Maximal Length With Blank And Null
Average Length
Average Length With Null
Average Length With Blank
Average Length With Blank And Null