Aggregating data using charts - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-26

The Chart tab shows a graphical representation of your data. It can also be used as way of aggregating data and previewing some interesting statistics.

The data aggregation in Talend Data Preparation allows you to easily gather the information of two columns to perform statistical analysis. You can select a first column and compare it with the sum, max, min or average of the second column containing numerical values. The chart will then display more advanced statistics than the ones that are displayed by default.

In this example, you work for an online retail company and the dataset you are working on contains information about your customers, such as their age, gender, and number of purchases. You will use the chart tab to quickly preview the average number of purchases depending on the age group of your customers.

Procedure

  1. Click the header of the column that will be used as base for the aggregation, Age group in this example.
    A chart showing the number of occurrences of each age group is displayed in the data profiling area.
  2. In the Chart tab, click the display options menu, set to Row count by default.
  3. In the Column drop-down list, select the Purchases column.
    This column contains the information that we want to link to the age groups. The drop-down lists all the columns that are compatible for aggregation, in other words, all other columns that contain numerical data, with the integer or decimal semantic type.
  4. In the Aggregation drop-down list, select Average.
  5. Click Ok.

Results

The Chart tab now displays the average number of purchases for each age group. You can see for example that the 18-25 group is the one that make the most orders. Point you mouse over each horizontal bar to look at the exact average for each group of record.

You have quickly gained some insight on your data with these statistics, and you could perform other aggregation operations, like comparing the total purchases depending on the gender of your customers for example, or any other data category of your dataset.

To remove the aggregation information from the charts, click Average Purchases > Remove.