Creating a catalog analysis

Talend Platform for Enterprise Integration Studio User Guide

EnrichVersion
5.6
EnrichProdName
Talend Platform for Enterprise Integration
task
Design and Development
Data Quality and Preparation
EnrichPlatform
Talend Studio

You can analyze one specific catalog in a database, if this entity is used in the physical structure of the database. The result of the analysis gives analytical information about the content of this catalog, for example number of rows, number of tables, number of rows per table and so on.

Prerequisite(s): At least one database connection has been created to connect to a database that uses the "catalog" entity.

Defining the analysis

  1. In the DQ Repository tree view, expand Data Profiling.

  2. Right-click the Analyses folder and select New Analysis.

    The [Create New Analysis] wizard opens.

  3. Expand the Catalog Analysis node and then click Catalog Structure Overview.

  4. Click the Next button.

    Note

    You can directly go to this step in the analysis creation wizard if you right-click the catalog to analyze in Metadata>DB Connections and select Overview analysis.

  5. In the Name field, enter a name for the current analysis.

    Note

    Avoid using special characters in the item names including:

    "~", "!", "`", "#", "^", "&", "*", "\\", "/", "?", ":", ";", "\"", ".", "(", ")", "'", "¥", "'", """, "«", "»", "<", ">".

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  6. Set the analysis metadata (purpose, description and author name) in the corresponding fields and click Next.

Selecting the catalog you want to analyze

  1. Expand DB Connections and the database that include catalog entities in its physical structure and select a catalog to analyze.

  2. Click Next.

  3. Set filters on tables and/or views in their corresponding fields according to your needs using the SQL language.

    By default, the analysis will include all tables and views in the catalog.

  4. Click Finish to close the [Create New Analysis] wizard.

    A folder for the newly created analysis is listed under Analysis in the DQ Repository tree view, and the analysis editor opens with the defined metadata.

    Note

    The display of the analysis editor depends on the parameters you set in the [Preferences] window. For more information, see Setting preferences of analysis editors and analysis results.

  5. Click Analysis Parameters and:

    • In the Number of connections per analysis field, set the number of concurrent connections allowed per analysis to the selected database connection.

      You can set this number according to the database available resources, that is the number of concurrent connections each database can support.

    • Check/modify filters on table and/or views, if any.

  6. In the Context Group Settings view, select from the list the context environment you want to use to run the analysis.

    The table in this view lists all context environments and their values you define in the Contexts view in the analysis editor. For further information, see Using context variables in analyses.

  7. Click the save icon on top of the editor and then press F6 to execute the current analysis.

    A message opens to confirm that the operation is in progress.

    Analysis results are stored in the Statistical information view.

  8. Click Statistical information to show analytical information about the content of the relevant catalog.

From the Statistical information view, you can:

  • Click the catalog in the analytical table to open a result list that details all tables included in the selected catalog with a summary of their content.

    The selected catalog is highlighted in blue. Catalogs highlighted in red indicate potential problems in data.

  • Right-click a table or a view and select Table analysis to create a table analysis on the selected item.

  • Click any column header in the analytical table to sort the listed data alphabetically.