Working with the quality bar - 7.3

Talend Data Preparation Getting Started Guide

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2023-01-05

The quickest way to identify incorrect data is to look at the quality bar.

Under each column is a quality bar that displays the amount of fields that have correct data, incorrect data or empty fields. Each category is represented by a color:

  • Green for data that matches the cell format
  • White for empty cells
  • Orange for data that does not match the cell format

Click any color to select, delete or clear the cells with data in an invalid format. Hovering over the colors allows you to display the exact number of lines for each category, as well as the percentage it represents in a column.

By looking at the quality bar under in the Email column header, you can see that there are empty cells and incorrect values among the data. You are going to remove them.

To use the quality bar to remove the lines containing those incorrect cells, proceed as follows:

Procedure

  1. Click the white part of the quality bar, in the header of the Email column.

    A drop-down menu opens.

  2. Click Delete the rows with empty cell.

    The empty cells of the Email columns have been deleted and only the invalid values, represented by the orange bar, remain.

  3. Repeat the last two steps, but this time, click the orange part of the quality bar, and select Delete the rows with invalid cell.

    The Email column is now cleaned of all invalid data or empty cells.

  4. Use the quality bar to remove the invalid cells from the Zip and Phone columns.

Results

The only remaining column with invalid data is now State, but you are going to treat it in a different way.