Text statistics - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

You can use the text statistics indicators to analyze columns only if their data mining type is set to nominal in the analysis editor. Otherwise, these statistics are grayed out in the Indicator Selection dialog box. For further information on the available data mining types, see Data mining types.

Text statistics analyze the characteristics of textual fields in the columns, including minimum, maximum, and average length.

  • Minimal Length: computes the minimal length of a text field. It excludes null and blank values.
  • Maximal Length: computes the maximal length of a text field. It excludes null and blank values.
  • Average Length: computes the average length of a field. It excludes null and blank values.

Other text indicators are available to count each of the above indicators with null values, with blank values or with null and blank values.

Null values are counted as data of 0 length, that is to say the minimal length of null values is 0. This means that the Other text indicators are available to count each of the above indicators with null values, with blank values or with null and blank values. Minimal Length With Null and the Maximal Length With Null compute the minimal/maximal length of a text field including null values, that are considered to be 0-length text.

Blank values are counted as regular data of 1 length. Empty values are counted as data of 0 length, that is to say the minimal length of blank values is 0. This means that the Minimal Length With Blank and the Maximal Length With Blank compute the minimal/maximal length of a text field including blank values.

The same are applied for all average indicators. Empty values are also counted as data of 0 length.

For example, compute the length of textual fields in a column containing the following values, using all different types of text statistic indicators:

Value Number of characters
"Brayan" 6
"Ava" 3
"_" 1
"" 0
<null> <null>
"__________" 10
Note: "_" represents a space character.
The results are as follows:
Table and graphical results of the Text Statistics indicator.

The following table shows the indicators that you can select in any database:

Indicator Supported data types with the Java analysis engine Supported data types with the SQL analysis engine
Minimal Length Text Text
Minimal Length With Null Text Text
Minimal Length With Blank Text Text
Minimal Length With Blank And Null Text Text
Maximal Length Text Text
Maximal Length With Null Text Text
Maximal Length With Blank Text Text
Maximal Length With Blank And Null Text Text
Average Length Text Text
Average Length With Null Text Text
Average Length With Blank Text Text
Average Length With Blank And Null Text Text