From the Profiling perspective of the studio, you can create a column analysis to compute the number of most frequent values for each distinct record in a column. After executing the column analysis, you can generate a ready-to-use Job that will extract in an output file the distinct values from a value frequency.
You can then use these distinct values as a reference data set for other data standardization processes.
In the example below a column analysis on a postal_code column in a MySQL database has been created and executed in the Profiling perspective of the studio.
Prerequisites: You have already created and executed a column analysis that uses the Value Frequency indicator.
To generate a Job that extracts distinct values from a value frequency, do the following
In the analysis editor, right-click the Value Frequency indicator.
Select Generate Job.
The Integration perspective opens on the generated Job.
The basic settings for the database component are already defined according to the database connection used in the column analysis.
The basic settings for the tAggregateRow component are already defined to count the distinct values from the value frequency of the postal_code column.
If required, use a different output component to recuperate the distinct values in a different type of file or in a database.
Save your Job and press F6 to execute it.
The Job extracts the distinct values from the value frequency and writes them in the defined output file.
You can then use this file as a kind of a reference file in your data quality Jobs. You can use the zip codes in the file when matching data on zip codes for instance.
For further information on the data quality components and Jobs, see the data quality chapter in the Talend Components Reference Guide.