Creating filters on the ages and states - Cloud

Talend Cloud Data Preparation Examples

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-04-04

Creating a filter is a quick way to identify or isolate data.

You will once again use filters to isolate the data that is of the most interest to you in this example, namely the age and location of your customers. The data profiling area on the bottom right of the interface allows you to interact with the charts illustrating the data of the age and state columns, and select a specific range of data.

Procedure

  1. Click the header of the age column to select its content.

    In the data profiling area, on the bottom right of the screen, you can see a vertical bar chart, displaying the number of occurrences of each value listed in the column.

    Bar chart showing the repartition of values in the age column.

    You can see here that the minimum age that can be found is 18, and the maximum value is 80.

  2. To limit the age values displayed on the grid and create a filter on the 30-55 range, you can either:
    • Drag both ends of the range slider to select the minimum and maximum values to be displayed.
      Bar chart showing the repartition of values in the age column, filtered between 30 and 55.
    • Enter 30 as minimum value and 55 as maximum directly in the dedicated fields.

    You can see that a new filter was applied on the dataset, and customers data is only displayed if it matches the condition set on the 30-55 age range.

    A filter is applied to only show the age values between 30 and 55.

    Filters can be created by manually entering values in the filter bar text area, but diagrams are a convenient and quick way to apply filters on your data, for one or several columns at a time.

    Now that you have vision on a specific age range, you will add a second filter on top of the previous one. Filters can be combined in many ways. Here you will choose to display the five state with the highest number of customers.

  3. Click the header of the state column to select its content.

    This time, the data is displayed as an horizontal bar chart in the profiling area.

    Bar chart showing the repartition of values for the state column.
  4. To create a filter on the top five states, those with the most customers, keep the Shift key pressed and click California, Texas, Florida, New York and Virginia.
    Bar chart showing the repartition of values for the state column, filtered on 5 states.

    As you can see in the filter bar, the filter is applied on top of the first one, and only the data that corresponds to both is displayed on the grid.

    Two filters are applied to only show some values of the age and state columns.
  5. To remove the data that is not used anymore and only keep this sample, click the Keep these filtered rows function from the functions panel.

    This function is only available if the Apply changes to: Filtered rows radio button is activated.

  6. Click the bin icon or click the cross in each individual filter to clear the filter bar.

Results

Your sample now only displays a restricted list of customers, that match the conditions you had fixed.