Text statistics - 7.1

Talend Real-time Big Data Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Design and Development
EnrichPlatform
Talend Studio

You can use the text statistics indicators to analyze columns only if their data mining type is set to nominal in the analysis editor. Otherwise, these statistics will be grayed out in the Indicator Selection dialog box. For further information on the available data mining types, see Data mining types.

Text statistics analyze the characteristics of textual fields in the columns, including minimum, maximum and average length.

  • Minimal Length: computes the minimal length of a non-null and non-empty text field.
  • Maximal Length: computes the maximal length of a non-null and non-empty text field.
  • Average Length: computes the average length of a non-null and non-empty field.

Other text indicators are available to count each of the above indicators with null values, with blank values or with null and blank values.

Null values will be counted as data of 0 length, that is to say the minimal length of null values is 0. This means that the Minimal Length With Null and the Maximal Length With Null will compute the minimal/maximal length of a text field including null values, that are considered to be 0-length text.

Blank values will be counted as regular data of 0 length, that is to say the minimal length of blank values is 0. This means that the Minimal Length With Blank and the Maximal Length With Blank will compute the minimal/maximal length of a text field including blank values.

The same will be applied for all average indicators.

For example, compute the length of textual fields in a column containing the following values, using all different types of text statistic indicators:

Value Number of characters
"Brayan" 6
"Ava" 3
"_" 1
"" 0
<null> <null>
"__________" 10
Note: "_" represents a space character.
The results are as follows: