Simple statistics - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

They provide simple statistics on the number of records falling in certain categories including the number of rows, the number of null values, the number of distinct and unique values, the number of duplicates, or the number of blank fields.

  • Blank Count: counts the number of blank rows. A "blank" is a non null textual data that contains only white space. Note that Oracle does not distinguish between the empty string and the null value.

    The LONG VARCHAR data type in Vertica is not supported.

  • Default Value Count: counts the number of default values.
  • Distinct Count: counts the number of distinct values of your column.
  • Duplicate Count: counts the number of values appearing more than once. You have the relation: Duplicate count + Unique count = Distinct count. For example, a,a,a,a,b,b,c,d,e => 9 values, 5 distinct values, 3 unique values, 2 duplicate values.
  • Null Count: counts the number of null rows.
  • Row Count: counts the number of rows.
  • Unique Count: counts the number of distinct values with only one occurrence. It is necessarily less or equal to Distinct counts.

The following table shows the indicators that you can select in any database:

Indicator Supported data types with the Java analysis engine Supported data types with the SQL analysis engine
Row Count All data types All data types
Null Count All data types All data types
Distinct Count All data types All data types
Unique Count All data types All data types
Duplicate Count All data types All data types
Blank Count Text Text
Default Value Count All data types, but only when the database table has a default value constraint All data types, but only when the database table has a default value constraint