Analyzing columns in a database

Talend Platform for Enterprise Integration Studio User Guide

EnrichVersion
5.6
EnrichProdName
Talend Platform for Enterprise Integration
task
Design and Development
Data Quality and Preparation
EnrichPlatform
Talend Studio

You can analyze the content of one or multiple columns and execute the created analyses using the Java or the SQL engine. This type of analysis provides statistics about the values within each column.

When you use the Java engine to run a column analysis, you can view the analyzed data according to parameters you set yourself. For more information, see Using the Java or the SQL engine.

Note

When you use the Java engine to run a column analysis on big sets or on data with many problems, it is advisable to define a maximum memory size threshold to execute the analysis as you may end up with a Java heap error. For more information, see Defining the maximum memory size threshold.

You can also analyze a set of columns. This type of analysis provides statistics on the values across all the data set (full records). For more information, see Analyzing tables in databases.

You can also generate a Job that removes duplicate values from a specific analyzed column. For more information on removing duplicate values, see Generating a Job to Identify duplicate values in an analyzed column.

The sequence of analyzing a column involves the following steps:

  1. Defining the column(s) to be analyzed.

    For more information, see How to define the columns to be analyzed.

  2. Settings predefined system indicators or indicators defined by the user for the column(s).

    For more information, see How to set indicators for the column(s) to be analyzed. For more information on indicator types and indicator management, see Indicators.

  3. Adding the patterns against which to define the content, structure and quality of the data.

    For more information, see Using regular expressions and SQL patterns in a column analysis. For more information on pattern types and management, see Patterns.

The following sections provide a detailed description on each of the preceding steps.