Setting system or user-defined indicators - 7.1

Talend Data Management Platform Studio User Guide

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Data Management Platform
task
Design and Development
EnrichPlatform
Talend Studio

About this task

Prerequisite(s): A column analysis is open in the analysis editor in the Profiling perspective of the studio. For more information, see Defining the columns to be analyzed.

Procedure

  1. From the Data preview view in the analysis editor, click Select indicators to open the Indicator Selection dialog box.
  2. From the Indicator Selection dialog box:
    Note: It is not very useful to use Pattern Frequency Statistics on a column of a Date type in databases when executing the analysis with the SQL engine. No data quality issues are returned by this indicator as all dates will be displayed using one single format. To learn more about profiling Date columns in Oracle, see the documentation on Date handling when profiling columns in Oracle (https://help.talend.com). If you attach the Date Pattern Frequency to a date column in your analysis, you can generate a date regular expression from the analysis results. For more information, see Generating a regular expression from the Date Pattern Frequency indicator.
  3. Click OK.
    The selected indicators are attached to the analyzed columns in the Analyzed Columns view.
    The analysis in this example provides/computes the following:
    • simple statistics on all columns. For further information about these indicators, see Simple statistics,
    • the characteristics of textual fields and the number of most frequent values for each distinct record in the fullname column . For further information, see Text statistics and Advanced statistics respectively,
    • patterns in the email column to show frequent and rare patterns so that you can identify quality issues more easily. For further information about these indicators, see Pattern frequency statistics,
    • the range, the inter quartile range and the mean and median values of the numeric data in the total_sales column. For further information about these indicators, see Summary statistics,
    • the frequency of the digits 1 through 9 in the sales figures to detect fraud. For further information, see Fraud Detection.