Dataset quality - Cloud

Talend Cloud Data Inventory User Guide

EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Data Inventory
task
Administration and Monitoring > Managing connections
Data Governance
Data Quality and Preparation > Enriching data
Data Quality and Preparation > Identifying data
Data Quality and Preparation > Managing datasets

Several visual indicators allow you to have a precise idea of the quality of your data.

The quality indicators are a quick and easy way to assess the quality of your data, at the sample level, as well as the record level. In the application, data can be categorized as empty, valid or invalid, against the semantic type automatically detected for a column, with the following color code:

  • Green for data that matches the column format
  • Orange for data that does not match the column format
  • Black for empty cells

The quality indicators can be found at the following locations:

  • From the dataset list:

    The quality of your datasets is displayed in the form of a quality bar. Point you mouse over a color to display the quality statistics of the dataset. The percentage, and exact number of values, that are empty, invalid or incorrect in the sample is displayed.

  • From the dataset overview:

    In the Data quality tile of the dataset overview, you will find pie charts showing the exact percentage and number of empty, valid and invalid values across the dataset sample. Each category is displayed in a dedicated pie chart.

  • From the dataset sample header:

    In the header above your dataset, you can also find pie charts showing the repartition of empty, valid and invalid values across the dataset sample. Each category is displayed in a dedicated pie chart. Point your mouse over a chart for detailed statistics.

  • From the quality bar:

    When using the grid view of your dataset, you can see that each column header contains a quality bar. The statistics displayed here applies to each specific column. Point your mouse over each color for detailed statistics of each category. In the grid view, cells containing invalid values according to the column semantic type are displayed with an orange left border.