Creating filters on the ages and states

Talend Data Preparation Quick Examples

author
Talend Documentation Team
EnrichVersion
6.5
2.3
EnrichProdName
Talend Data Services Platform
Talend Big Data
Talend Real-Time Big Data Platform
Talend Data Integration
Talend Data Fabric
Talend MDM Platform
Talend Big Data Platform
Talend ESB
Talend Data Management Platform
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

Creating a filter is a quick way to identify or isolate data.

You will once again use filters to isolate the data that is of the most interest to you in this example, namely the age and location of your customers. The data profiling area on the bottom right of the interface allows you to interact with the charts illustrating the data of the age and state columns, and select a specific range of data.

Procedure

  1. Click the header of the age column to select its content.

    In the data profiling area, on the bottom right of the screen, you can see a vertical bar chart, displaying the number of occurences of each value listed in the column.

    You can see here that the minimum age that can be found is 18, and the maximum value is 80.

  2. To limit the age values displayed on the grid and create a filter on the 30-55 range, you can either:
    • Drag both ends of the range slider to select the minimum and maximum values to be displayed.
    • Enter 30 as minimum value and 55 as maximum directly in the dedicated fields.

    You can see that a new filter was applied on the dataset, and customers data is only displayed if it matches the condition set on the 30-55 age range.

    Filters can be created by manually entering values in the filter bar text area, but diagrams are a convenient and quick way to apply filters on your data, for one or several columns at a time.

    Now that you have vision on a specific age range, you will add a second filter on top of the previous one. Filters can be combined in many ways. Here you will choose to display the five state with the highest number of customers.

  3. Click the header of the state column to select its content.

    This time, the data is displayed as an horizontal bar chart in the profiling area.

  4. To create a filter on the top five states, those with the most customers, keep the Shift key pressed and click California, Texas, Florida, New York and Virginia.

    As you can see in the filter bar, the filter is applied on top of the first one, and only the data that corresponds to both is displayed on the grid.

  5. To remove the data that is not used anymore and only keep this sample, click the Keep these filtered rows function from the functions panel.

    This function is only available if the Apply changes to: Filtered rows radio button is activated.

  6. Click the bin icon or click the cross in each individual filter to clear the filter bar.

Results

Your sample now only displays a restricted list of customers, that match the conditions you had fixed.