Extracting distinct values - Cloud - 8.0

Talend Studio User Guide

Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Studio
Design and Development
Last publication date
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

About this task

From the Profiling perspective, you can create a column analysis to compute the number of most frequent values for each distinct record in a column. After executing the column analysis, you can generate a ready-to-use Job that will extract in an output file the distinct values from a value frequency.

You can then use these distinct values as a reference dataset for other data standardization processes.

In the example below a column analysis on a postal_code column in a MySQL database has been created and executed in the Profiling perspective.

Prerequisites: You have already created and executed a column analysis that uses the Value Frequency indicator.

To generate a Job that extracts distinct values from a value frequency, do the following


  1. In the analysis editor, right-click the Value Frequency indicator.
  2. Select Generate Job.
    The Integration perspective opens on the generated Job.
    The basic settings for the database component are already defined according to the database connection used in the column analysis.
    The basic settings for the tAggregateRow component are already defined to count the distinct values from the value frequency of the postal_code column.
  3. Optional: Use a different output component to recuperate the distinct values in a different type of file or in a database.
  4. Save your Job and press F6 to execute it.
    The Job extracts the distinct values from the value frequency and writes them in the defined output file.
    You can then use this file as a kind of a reference file in your data quality Jobs. You can use the zip codes in the file when matching data on zip codes for instance.
    For further information on the data quality components and Jobs, see the data quality chapter in the Talend Components Reference Guide.