Dataset quality - Cloud

Talend Cloud Data Preparation User Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Preparation
Content
Administration and Monitoring > Managing connections
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Managing datasets
Last publication date
2024-04-15

Several visual indicators let you have a precise idea of the quality of your data.

The quality indicators are a quick and easy way to assess the quality of your data, at the sample level, as well as the record level. In the application, data can be categorized as invalid, empty, or valid against:
  • The semantic type of the column.
  • The data quality rules applied to one or more fields.
Tip: If you are using a Snowflake connection, you can use the pushdown parameter to calculate the dataset quality on the entire dataset. For more information, see Adding the pushdown parameter to a Snowflake connection.
Color code for the quality bars
Color Description
Red The values do not match the column format or fulfill the rule condition but not the validation expression or the rule cannot be executed on those values. For example, if the rule must compare a string with a number. For more information on the errors, click the red vertical bar next to the value.
Gray The cells are empty or the values are not applicable for the rule. They do not fulfill the condition and no alternative validation expression has been defined.
Green The values match the column format or they fulfill all rule statements.

The quality indicators can be found at the following locations:

  • From the dataset list:
    A dataset named 'customers' displays a quality bar with 1.8% of empty values.

    The quality of your datasets is displayed in the form of a quality bar. Hover over a color to display the quality statistics of the dataset. The percentage, and exact number of values, that are invalid, empty, or valid in the sample is displayed.

  • From the dataset overview:
    In the Data quality tile of the dataset overview, you will find bar charts showing the exact percentage and number of empty, valid, and invalid values across the dataset sample. Each category is displayed in a dedicated chart.
    Data quality tile showing 1.1% of invalid values, 1.8% of empty values and 97.1% of valid values.

    When the sample refresh fails, an error message is displayed in the tile. For more information, see Issues on the sample refresh.

    In the Data quality rules tile of the dataset overview, the compliance bar shows the exact percentage and number of invalid, non-applicable, and valid values across the dataset sample.
    Data quality rules tile showing two rules with compliance bars.

    If the warning Warning icon or error Error icon icons are displayed next to the rule name, see Issues in the Data quality rules tile or the dataset header.

  • From the dataset sample header:
    Dataset sample header showing 1.1% of invalid values, 1.8% of empty values and 97.1% of valid values.

    In the header above your dataset, you can also find bar charts showing the repartition of invalid, empty, and valid values across the dataset sample. Each category is displayed in a dedicated chart. Hover over a chart for detailed statistics.

  • From the quality bar:
    Dataset quality bar showing phone records with 14.6% of empty values.
    When using the grid view of your dataset, you can see that each column header contains a quality bar. The statistics displayed here applies to each specific column. Hover over each color for detailed statistics of each category. In the grid view, cells containing invalid values according to the column semantic type are displayed with a red vertical bar. Click this bar to get more information on the invalid value.
    The mouse hovers on a phone number record in a grid view, with a red vertical bar indicating an invalid value.