Cleansing your data
Now that your preparation has been saved, you can start working on the customer data, like with any other dataset, and choose among all the usual functions.
The dataset that you have imported originally contains 20,000 rows but only a sample of the first 10,000 by default rows is displayed. Don't worry, all the preparation steps that you add can be applied to the whole dataset.
You will perform some basic cleansing operations, to ensure that all the data contained in the dataset is valid and free of errors.
You can for example notice the presence of unnecessary whitespaces in some entries of the First_Name and Last_Name columns.
The quality bar under each column also indicates that your data contains rows with empty or invalid cells. The Email column, for example, contains both.
You are going to delete all the empty and invalid rows from the preparation in a single action, and remove the formatting errors in the columns containing the customer names.
Procedure
Results
In two simple actions, you have removed all the errors contained in your dataset and improved the quality of your data.
The quality bar for each column is now completely green, indicating that there is no invalid data left in your preparation.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!