Defining the analysis of discrete data - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

Procedure

  1. In the DQ Repository tree view, expand Metadata and browse to the numerical column you want to analyze.
  2. Right-click the numerical column and select Column Analysis > Discrete data Analysis.
    In this example, you want to convert customer age into a number of discrete bins, or range of age values.
    The New Analysis wizard opens.
  3. In the Name field, enter a name for the analysis.
    Important:

    Do not use the following special characters in the item names: ~ ! ` # ^ * & \\ / ? : ; \ , . ( ) ¥ ' " « » < >

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  4. Set the analysis metadata and click Finish.
    The analysis opens in the analysis editor and the Simple Statistics and the Bin Frequency indicators are automatically assigned to the numeric column.
  5. Double-click the Bin Frequency indicator to open the Indicator settings dialog box.
    Overview of the Indicator Settings dialog box.
  6. Set the bins minimum and maximum values and the number of bins in the corresponding fields.
    If you set the number of bins is set to 0, no bin is created. The indicator computes the frequency of each value of the column.
  7. Select the Set ranges manually check box.
    The four read-only fields in the lower part of the Create Bins dialog box show you the data that Tableau uses to suggest a bin size. You can also consider these values if you want to set a bin size manually.
    Continuous numeric data is aggregated into discrete bins. Four ranges are listed in the table with a suggested bin size. The minimal value is the beginning of the first bin, and the maximal value is the end of the last bin. The size of each bin is determined by dividing the difference between the smallest and the largest values by the number of bins.
    You can always modify these values if you want to set a bin size manually. The value in the number of bins field is updated automatically with the new range number.