Extracting distinct values - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

Before you begin

You have created and executed a column analysis that uses the Value Frequency indicator.

About this task

From the Profiling perspective, you can create a column analysis to compute the number of most frequent values for each distinct record in a column. After executing the column analysis, you can generate a ready-to-use Job that will extract in an output file the distinct values from a value frequency.

You can then use these distinct values as a reference dataset for other data standardization processes.

In the example below a column analysis on a postal_code column in a MySQL database has been created and executed in the Profiling perspective.

Procedure

  1. In the analysis editor, right-click the Value Frequency indicator.
    Contextual menu of an indicator from the Analyzed Columns section.
  2. Select Generate Job.
    The Integration perspective opens on the generated Job.
    Generated Job using the tMysqlInput, tAggregateRow, and tFileOutputDelimited components.
    The basic settings for the database component are already defined according to the database connection used in the column analysis.
    The basic settings for the tAggregateRow component are already defined to count the distinct values from the value frequency of the postal_code column.
    Overview of the tAggregateRow basic settings.
  3. Optional: Use a different output component to recuperate the distinct values in a different type of file or in a database.
  4. Save your Job and press F6 to execute it.
    The Job extracts the distinct values from the value frequency and writes them in the defined output file.
    You can then use this file as a kind of a reference file in your data quality Jobs. You can use the ZIP codes in the file when matching data on ZIP codes for instance.
    For further information on the data quality components and Jobs, see Data Quality components.