Data profiling details - Cloud

Talend Cloud Data Catalog User Guide

Version
Cloud
Language
English (United States)
Product
Talend Cloud
Module
Talend Data Catalog
Content
Data Governance
Talend Cloud Data Catalog can store and display the following data profile information for table/view and column objects:
Type Description
Inferred Datatypes [type, rows] List of data type matches and their frequency as a percentage, classified from the highest value to the lowest one.

The column data type is detected by the profiler. When a column has data of different data types, the profiler chooses the most used one. You can overwrite the value manually. The value could contradict the data type declared by the database. For example, when VARCHAR database column contains only date values, the profiler sets the DATE data type.

The supported types are Text, Date, Time, DateTime, Geographical, No Percentiles, Means, Median, Variance, Std. Deviation and Number.
Frequency [value, rows] Distribution of values and their frequency as a percentage.
Patterns [pattern, rows] List of different patterns of data presentation discovered in the source and their frequency as a percentage.
Inferred Semantic Types List of inferred semantic types.
Data Profiling Statistics
  • Profiling Date: Date of data profiling execution.
  • Count: Number of rows actually profiled, which is either the total number in the source or the limit set when defining the harvesting options.
  • Distinct: non-distinct=total-distinct-empty. For example, when there is one million rows and the column has much less such as 10 distinct values, the data is considered to be distinct.
  • Duplicate: Duplicate rows in database or in files.
  • Valid: Valid rows in database or in files.
  • Empty: Null rows in database or empty rows in files.
  • Invalid: Invalid rows in database or in files.
  • Avg length: Average length of values.
  • Min length: Minimum length of values.
  • Max length: Maximum length of values.