Removing empty and invalid records - 8.0

Talend Data Preparation Examples

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-01-12

The quality bar indicates that a column contains empty or invalid records.

In the quality bar, data that matches the column type is shown in green, while orange shows invalid data that does not match the column type. Empty records are shown in grey.

Because you want to focus on customers from a specific age range and specific states, empty data in the corresponding columns would be useless for you. You are simply going to remove the rows with empty data in the age and state columns, as well as removing invalid values from the dataset.

Procedure

  1. Click the white menu icon on the top left of the grid and select Display rows with invalid or empty values.

    You can see that this action creates a filter on your data, and only the rows with empty or invalid entries from the dataset are now displayed. All the filters that are applied on your data at any moment can be seen on top of the grid.

    In addition, you can see that a new option is available at the bottom of the functions panel. Indeed, when any filter, or condition, is applied, you have the choice to apply functions on the full data, or the filtered data only. This is Talend Data Preparation's way of working with conditions.

    Now that this first filter is active on your whole data, you are going to specifically remove empty records from the age and state columns.

  2. Click the grey part of the quality bar in the header of the age column.
  3. From the menu that opens, select Delete the rows with empty cells.

    Now that the empty values have been removed, the age column only contains valid data, as you can see in the quality bar for this column.

  4. Repeat the same operation for the state column.

    As revealed by the condition currently active, other columns in your dataset contain empty values, but since the focus is on the age and location, we can leave them as is. On the other hand, you will remove the invalid data from the phone number column, that can be used for marketing purposes.

  5. Click the orange part of the quality bar of the phone number and select Delete the rows with invalid cell.

    All the data that was considered invalid has now been removed.

  6. Click the bin icon in the filter bar to clear the filter and display the whole dataset again.

Results

The rows with empty or invalid values for the age, state and phone numbers columns have been removed. Now that the dataset is a bit cleaner, you can start focusing on the data you want to put in light.