Exploring the results of the numerical correlation analysis - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

Before you begin

A numerical correlation analysis is defined and executed in the Profiling perspective of Talend Studio.

Procedure

  1. In the Analysis Results view of the analysis editor, click Graphics, Simple Statistics or Data to show the generated graphic, the number of the analyzed records or the actual analyzed data respectively.
    In the Graphics view, the data plotted in the bubble chart have different colors with the legend pointing out which color refers to which data.
    Graphical result of the 'Average of AGE versus count'.

    The more the bubble is near the left axis the less confident we are in the average of the numeric column. For the selected bubble in the above example, the company name is missing and there are only two data records, hence the bubble is near the left axis. We cannot be confident about age average with only two records. When looking for data quality issues, these bubbles can indicate problematic values.

    The bubbles near the top of the chart and those near the bottom of the chart may suggest data quality issues too, too big or too small age average in the above example.

  2. From the generated graphic, you can perform the following actions:
    • Clear the check box of the values you want to hide in the bubble chart.
    • Hover over a bubble to see the correlated data values at that position.
    • Right-click a bubble and select:
      • Show in full screen to open the generated graphic in a full screen.
      • View rows to access a list of all analyzed rows in the selected column.

Results

The below figure illustrates an example of the SQL editor listing the correlated data values at the selected position.
Overview of the SQL editor.

From the SQL editor, you can save the executed query and list it under the Libraries > Source Files folders in the DQ Repository tree view if you click the save icon on the editor toolbar. For more information, see Saving the queries executed on indicators.

The Simple Statistics view shows the number of the analyzed records falling in certain categories, including the number of rows, the number of distinct and unique values and the number of duplicates.

Table and graphic showing the results for the Simple Statistics indicator.

The Data view displays the actual analyzed data.

Analyzed data from the Data section.

You can sort the data listed in the result table by simply clicking any column header in the table.