Removing empty and invalid records - Cloud

Talend Cloud Data Preparation Examples

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-04-04

The quality bar indicates that a column contains empty or invalid records.

Some columns in the grid with their quality bar showing invalid or empty cells.

In the quality bar, data that matches the column type is shown in green, while red shows invalid data that does not match the column type. Empty records are shown in grey.

Because you want to focus on customers from a specific age range and specific states, empty data in the corresponding columns would be useless for you. You are simply going to remove the rows with empty data in the age and state columns, as well as removing invalid values from the dataset.

Procedure

  1. Click the white menu icon on the top left of the grid and select Display rows with invalid or empty values.
    Filtering options for the empty and invalid rows in the whole dataset.

    You can see that this action creates a filter on your data, and only the rows with empty or invalid entries from the dataset are now displayed. All the filters that are applied on your data at any moment can be seen on top of the grid.

    In addition, you can see that a new option is available at the bottom of the functions panel. Indeed, when any filter, or condition, is applied, you have the choice to apply functions on the full data, or the filtered data only. This is Talend Cloud Data Preparation's way of working with conditions.

    Now that this first filter is active on your whole data, you are going to specifically remove empty records from the age and state columns.

  2. Click the grey part of the quality bar in the header of the age column.
  3. From the menu that opens, select Delete the rows with empty cell.

    Now that the empty values have been removed, the age column only contains valid data, as you can see in the quality bar for this column.

  4. Repeat the same operation for the state column.

    As revealed by the condition currently active, other columns in your dataset contain empty values, but since the focus is on the age and location, we can leave them as is. On the other hand, you will remove the invalid data from the phone number column, that can be used for marketing purposes.

  5. Click the red part of the quality bar of the phone number and select Delete the rows with invalid cell.

    All the data that was considered invalid has now been removed.

  6. Click the bin icon in the filter bar to clear the filter and display the whole dataset again.

Results

The rows with empty or invalid values for the age, state and phone numbers columns have been removed. Now that the dataset is a bit cleaner, you can start focusing on the data you want to put in light.