Sampling and profiling data - Cloud

Talend Cloud Data Catalog User Guide

English (United States)
Talend Cloud
Talend Data Catalog
Data Governance

While technical and descriptive metadata contain a great wealth of information about metadata elements, this is only true if the information has been documented on those elements. In many cases, that metadata is incomplete and the best way to determine what that metadata should be (for example semantic data type or valid values) is to look at the data itself.

Talend Cloud Data Catalog provides the option to profile the actual data contained in files and tables, in addition to the metadata captured from a source format or tool, as part of the harvesting process. At harvesting time, you can specify the number of records to profile and how many should be maintained as a sample for visualization later.

That information is then available when you navigate to the file or table’s page or when looking at individual fields or columns from the file or table.

Talend Cloud Data Catalog makes an effort to protect the information and show it to authorised users only. You need to have the Data Viewer role to look at the information. Generic profiling statistics, like “% of distinct values” are available to all users that can view the content.

The application can store and display the following data profiling details for table/view and column objects:
  • Counts (standard and custom counts, like empty and valid rows)
  • Values (distinct values and their counts)
  • Patterns (patterns and their counts)
  • Data types (inferred data types and their counts)