Creating a database content analysis - 6.1

Talend Real-time Big Data Platform Studio User Guide

Talend Real-Time Big Data Platform
Data Quality and Preparation
Design and Development
Talend Studio

From the Profiling perspective of the studio, you can create an analysis of the content of a given database.

Prerequisite(s): At least, one database connection is set in the Profiling perspective of the studio. For further information, see Connecting to a database.

To create a database content analysis, you must first define the relevant analysis and then select the database connection you want to analyze.

Defining the analysis

  1. In the DQ Repository tree view, expand Data Profiling.

  2. Right-click the Analyses folder and select New Analysis.

    The [Create New Analysis] wizard opens.

  3. In the filter field, start typing connection overview analysis, select Connection Overview Analysis from the list and click Next.

  4. In the Name field, enter a name for the current analysis.


    Avoid using special characters in the item names including:

    "~", "!", "`", "#", "^", "&", "*", "\\", "/", "?", ":", ";", "\"", ".", "(", ")", "'", "¥", "'", """, "«", "»", "<", ">".

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  5. Set the analysis metadata (purpose, description and author name) in the corresponding fields and click Next.

Selecting the database connection you want to analyze

  1. Expand DB Connections and select a database connection to analyze, if more than one exists.

  2. Click Next to proceed to the next step.

  3. Set filters on tables and/or views in their corresponding fields according to your needs using the SQL language.

    By default, the analysis will include all tables and views in the database.

  4. Click Finish to close the [Create New Analysis] wizard.

    A folder for the newly created analysis is listed under the Analyses folder in the DQ Repository tree view, and the connection editor opens with the defined metadata.


    The display of the connection editor depends on the parameters you set in the [Preferences] window. For more information, see Setting preferences of analysis editors and analysis results.

  5. Click Analysis Parameters and:

    • In the Number of connections per analysis field, set the number of concurrent connections allowed per analysis to the selected database connection.

      You can set this number according to the database available resources, that is the number of concurrent connections each database can support.

    • Check/modify filters on table and/or views, if any.

    • Select the Reload databases check box if you want to reload all databases in your connection on the server when you run the overview analysis.

      When you try to reload a database, a message will prompt you for confirmation as any change in the database structure may affect existing analyses.

  6. In the Context Group Settings view, select from the list the context environment you want to use to run the analysis.

    The table in this view lists all context environments and their values you define in the Contexts view in the analysis editor. For further information, see Using context variables in analyses.

  7. Click Analysis Summary to show all the parameters of the current analysis along with the current analysis execution status.

  8. Click the save icon on top of the editor and then press F6 to execute the current analysis. A message opens to confirm that the operation is in progress.

    Analysis results are stored in the Statistical information view.

  9. Click Statistical information to show analytical information about the content of the relevant database.

From the Statistical information view, you can:

  • Click a catalog or a schema in the Statistical information view to list all tables included in it along with a summary of their content: number of rows, keys and user-defined indexes.

    The selected catalog or schema is highlighted in blue. Catalogs or schemas highlighted in red indicate potential problems in data.

  • Right-click a catalog or a schema and select Overview analysis to analyze the content of the selected item.

  • Right-click a table or a view and select Table analysis to create a table analysis on the selected item.

  • Click any column header in the analytical table to sort alphabetically the data listed in catalogs or schemas. You can also sort alphabetically all columns in the result table doing the same.