Skip to main content Skip to complementary content

Setting indicators on columns

After defining the columns to be analyzed, set either system or user-defined indicators for each of the defined columns.

Setting system or user-defined indicators

Before you begin

A column analysis is open in the analysis editor in the Profiling perspective of Talend Studio.

Procedure

  1. From the Data preview section in the analysis editor, click Select indicators to open the Indicator Selection dialog box.
  2. From the Indicator Selection dialog box:
    Information noteNote:

    It is useless to use Pattern Frequency Statistics on a column of a Date type in databases when executing the analysis with the SQL engine. No data quality issues are returned by this indicator as all dates will be displayed using one single format.

    If you attach the Date Pattern Frequency to a date column in your analysis, you can generate a date regular expression from the analysis results.

  3. Click OK.
    The selected indicators are attached to the analyzed columns in the Analyzed Columns section.
    The analysis in this example provides/computes the following:
    • Simple statistics on all columns,
    • The characteristics of textual fields, using text statistics indicators, and the number of most frequent values for each distinct record in the indicators,
    • Patterns in the email column to show frequent and rare patterns so that you can identify quality issues more easily, using pattern frequency statistics indicators,
    • The range, the inter quartile range and the mean and median values of the numeric data in the total_sales column, using summary statistics indicators,
    • The frequency of the digits 1 through 9 in the sales figures to detect fraud, using fraud detection indicators.

Setting options for system or user-defined indicators

Before you begin

A column analysis is open in the analysis editor. For more information, see Defining the columns to be analyzed.

About this task

You can define expected thresholds on the indicator's value. The threshold you define is used for measuring the quality of data. If the indicator's value is outside the defined threshold, then the data is of bad quality. You can define only one threshold or no threshold at all. You may set these thresholds either by value or by percentage with respect to the row count.

For more information about setting indicators, see Setting system or user-defined indicators.

Procedure

  1. In the Analyzed Columns view in the analysis editor, click Options icon next to the indicator.
  2. In the dialog box that opens, set the parameters for the given indicator.
    For example, if you want to flag if there are null values in the column you analyze, you can set 0 in the Upper threshold field for the Null Count indicator.
    Overview of the Indicator Settings dialog box.

    Indicators settings dialog boxes differ according to the parameters specific for each indicator. For more information about different indicator parameters, see Indicator parameters.

  3. Click Finish to close the dialog box.
  4. Save the analysis.

Setting user-defined indicators from the analysis editor

Before you begin

To set user-defined indicators from the analysis editor for the columns to be analyzed, do the following:

Procedure

  1. Either:
    1. In the analysis editor and from the Analyzed Columns view, click Add UDI next to the column name to which you want to define the indicator.
      The UDI selector dialog box opens.
      Location of the Add UDI icon and Overview of the UDI Selector dialog box.
    2. Select the user-defined indicators and then click OK.
  2. Or:
    1. In the DQ Repository tree view, expand Libraries > Indicators.
    2. From the User Defined Indicator folder, drop the user-defined indicators against which you want to analyze the column content to the column name in the Analyzed Columns view.
      The user-defined indicator is listed under the column name.
    3. Optional: Set a threshold for the user-defined indicator.
    4. Save the analysis.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!