The quality bar indicates that a column contains empty or invalid
records.
In the quality bar, data that matches the column type is shown in green,
while orange shows invalid data that does not match the column type. Empty records are
shown in grey.
Because you want to focus on customers from a specific age range and specific states,
empty data in the corresponding columns would be useless for you. You are simply going
to remove the rows with empty data in the age and
state columns, as well as removing invalid values from the
dataset.
Procedure
-
Click the white menu icon on the top left of the grid and select
Display rows with invalid or empty values.
You can see that this action creates a filter on your data, and only the rows
with empty or invalid entries from the dataset are now displayed. All the
filters that are applied on your data at any moment can be seen on top of
the grid.
In addition, you can see that a new option is available at the bottom of the
functions panel. Indeed, when any filter, or condition, is applied, you have
the choice to apply functions on the full data, or the filtered data only.
This is Talend Data Preparation's
way of working with conditions.
Now that this first filter is active on your whole data, you are going to
specifically remove empty records from the age and
state columns.
-
Click the grey part of the quality bar in the header of the
age column.
-
From the menu that opens, select Delete the rows with empty
cells.
Now that the empty values have been removed, the age
column only contains valid data, as you can see in the quality bar for this
column.
-
Repeat the same operation for the state column.
As revealed by the condition currently active, other columns in your dataset
contain empty values, but since the focus is on the age and location, we can
leave them as is. On the other hand, you will remove the invalid data from
the phone number column, that can be used for
marketing purposes.
-
Click the orange part of the quality bar of the phone
number and select Delete the rows with invalid
cell.
All the data that was considered invalid has now been removed.
-
Click the bin icon in the filter bar to clear the filter and display the whole
dataset again.
Results
The rows with empty or invalid values for the age,
state and phone numbers columns have
been removed. Now that the dataset is a bit cleaner, you can start focusing on the
data you want to put in light.