From the Overview page, you could get an idea of the overall quality of the dataset, but it is possible to look at more precise indicators.
While the Data quality tile allowed you to get an idea of the quality at the dataset level, you will now access the dataset Sample to look at the quality at the record level.
In the application, data can be categorized as empty, valid or invalid, against the semantic type automatically detected for a column, with the following color code:
- Green for data that matches the column format
- Orange for data that does not match the column format
- Black for empty cells
From the left panel menu, click the
Your dataset opens in a grid format, and all 100 rows are displayed in a tabular form. The maximum sample size in Talend Cloud Data Inventory is 10,000 records. The sample will show by default a grid view of your .csv file, but for other file types, or depending on your preferences, you can decide to display the sample in a hierarchical view, or a raw view.
In the header above the dataset, you can see the same pie
charts as in the overview, showing the repartition of invalid, empty, and valid
values across the entire dataset.
Take a look at the header of each column.
When using the grid view of your dataset, every column header integrates a quality bar. The statistics displayed here apply to each specific column.
Point your mouse over each color in the quality bar of the
production_country column to display
the detailed statistics for this specific column.
Countrysemantic type, 1 empty cell, and 91 valid cells. In the grid view, cells containing invalid values are displayed with an orange left border.