Extracting distinct values

Before you begin

You have created and executed a column analysis that uses the Value Frequency indicator.

About this task

From the Profiling perspective, you can create a column analysis to compute the number of most frequent values for each distinct record in a column. After executing the column analysis, you can generate a ready-to-use Job that will extract in an output file the distinct values from a value frequency.

You can then use these distinct values as a reference dataset for other data standardization processes.

In the example below a column analysis on a postal_code column in a MySQL database has been created and executed in the Profiling perspective.

Procedure

In the analysis editor, right-click the Value Frequency indicator.
Select Generate Job.
The Integration perspective opens on the generated Job.

The basic settings for the database component are already defined according to the database connection used in the column analysis.

The basic settings for the tAggregateRow component are already defined to count the distinct values from the value frequency of the postal_code column.
Optional: Use a different output component to recuperate the distinct values in a different type of file or in a database.
Save your Job and press F6 to execute it.
The Job extracts the distinct values from the value frequency and writes them in the defined output file.

You can then use this file as a kind of a reference file in your data quality Jobs. You can use the ZIP codes in the file when matching data on ZIP codes for instance.

For further information on the data quality components and Jobs, see Data Quality components.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here