Advanced statistics - 7.3

Talend Open Studio User Guide

Version
7.3
Language
English
Product
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Module
Talend Studio
Content
Design and Development
Last publication date
2023-10-11
Available in...

Open Studio for Data Quality

They determine the most probable and the most frequent values and builds frequency tables. The main advanced statistics include the following values:

  • Mode: computes the most probable value. For numerical data or continuous data, you can set bins in the parameters of this indicator. It is different from the "average" and the "median". It is good for addressing categorical attributes.
  • Value Frequency: computes the number of most frequent values for each distinct record.
  • All other Value Frequency indicators are available to aggregate date and numerical data (with respect to "date", "week", "month", "quarter", "year" and "bin").
  • Value Low Frequency: computes the number of less frequent records for each distinct record.
  • All other Value Low Frequency indicators are available to aggregate date and numerical data (with respect to "date", "week", "month", "quarter", "year" and "bin"), where "bin" is the aggregation of numerical data by intervals.

The following table shows the indicators that you can select in any database:

Data type Number Text Date Others
Analysis engine type Java SQL Java SQL Java SQL Java SQL
Mode
Value (Low) Frequency
Date (Low) Frequency * *
Week (Low) Frequency * *
Month (Low) Frequency * *
Quarter (Low) Frequency * *
Year (Low) Frequency * *
Bin (Low) Frequency
* Except for the time data type