Skip to main content Skip to complementary content

Creating an analysis

Creating a database content analysis

From the Profiling perspective of Talend Studio, you can create an analysis to examine the content of a given database.

About this task

Before you begin, you have defined at least one database connection in the Profiling perspective of Talend Studio.

To create a database content analysis, you must first define the relevant analysis and then select the database connection you want to analyze.

From the Statistical information view, you can:
  • Click a catalog or a schema to list all tables included in it along with a summary of their content: number of rows, keys, and user-defined indexes.

    The selected catalog or schema is highlighted in blue. Catalogs or schemas highlighted in red indicate potential problems in data.

  • Right-click a catalog or a schema and select Overview analysis to analyze the content of the selected item.
  • Right-click a table or a view and select Table analysis to create a table analysis on the selected item.
  • Click any column header in the analytical table to sort alphabetically the data listed in catalogs or schemas.

Defining the connection overview analysis

Procedure

  1. In the DQ Repository tree view, expand Data Profiling.
  2. Right-click the Analyses folder and select New Analysis.
    The Create New Analysis wizard opens.
    Overview of the Create New Analysis wizard.
  3. In the filter field, start typing connection overview analysis, select Connection Overview Analysis from the list that is displayed and click Next.
    Example of name, purpose, and description of an analysis.
    You can create a database content analysis in a shortcut procedure if you right-click the database under Metadata > DB connections and select Overview analysis from the contextual menu.
  4. In the Name field, enter a name for the current analysis.
    Information noteImportant:

    Do not use the following special characters in the item names: ~ ! ` # ^ * & \\ / ? : ; \ , . ( ) ¥ ' " « » < >

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  5. Set the analysis metadata (purpose, description, and author name) in the corresponding fields and click Next.

Selecting the database connection you want to analyze

Procedure

  1. Expand DB Connections and select a database connection to analyze, if more than one exists.
  2. Click Next.
  3. Set filters on the tables and views you want to analyze in their corresponding fields using the SQL language.
    Example of values in the Table name filter and View name filter fields.
    By default, the analysis examines all tables and views in the database.
  4. Click Finish to close the Create New Analysis wizard.
    A folder for the newly created analysis is listed under the Analyses folder in the DQ Repository tree view, and the connection editor opens with the defined metadata.
    Overview of the Analysis Metadata section containing the defined metadata.
    Information noteNote: The display of the connection editor depends on the parameters you set in the Preferences window. For more information, see Setting preferences of analysis editors and analysis results.
  5. Click Analysis Parameters and do the following:
    1. In the Number of connections per analysis field, set the number of concurrent connections allowed per analysis to the selected database connection.
      You can set this number according to the database available resources, that is the number of concurrent connections each database can support.
    2. Verify and modify filters on table and views, if any.
      You can use context values.
    3. If you want to reload all databases in your connection on the server when running the overview analysis, select the Reload databases check box .
      When you try to reload a database, a message will prompt you for confirmation as any change in the database structure may affect existing analyses.
  6. In the Context Settings view, select from the list the context environment you want to use to run the analysis.
    The table in this view lists all context environments and their values you define in the Context view in the analysis editor. For further information, see Using context variables in analyses.
  7. Press F6 to execute the analysis.
    A message opens at the bottom of the editor to confirm that the operation is in progress and analysis results are opened in the Analysis Results view.

Creating a catalog or schema analysis

You can use the Profiling perspective of Talend Studio to analyze one specific catalog or schema in a database, if this entity is used in the physical structure of the database.

The result of the analysis gives analytical information about the content of this schema, for example number of rows, number of tables, number of rows per table and so on.

Before you begin

At least one database connection has been created to connect to a database that uses the "catalog" or "schema" entity. For further information, see Connecting to a database.

Procedure

  1. Under DB connections in the DQ Repository tree view, right-click the catalog or schema for which you want to create content analysis and, select Overview analysis from the contextual menu.
    This example shows how to create a schema analysis.
  2. In the wizard that opens, enter a name for the current analysis.
    Information noteImportant:

    Do not use the following special characters in the item names: ~ ! ` # ^ * & \\ / ? : ; \ , . ( ) ¥ ' " « » < >

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  3. If required, set the analysis metadata (purpose, description, and author name) in the corresponding fields and click Next.
  4. Set filters on the tables and views you want to analyze in their corresponding fields using the SQL language.
    By default, the analysis examines all tables and views in the catalog.
    Example of values in the Table name filter and View name filter fields.
  5. Click Finish.
    A folder for the newly created analysis is listed under Analysis in the DQ Repository tree view, and the analysis editor opens with the defined metadata.
  6. Press F6 to execute the analysis.
    A message opens at the bottom of the editor to confirm that the operation is in progress and analysis results are opened in the Analysis Results view.

    From the Statistical information view, you can:

    • Click the schema to list all tables included in it along with a summary of their content: number of rows, keys, and user-defined indexes.

      The selected schema is highlighted in blue. Schemas highlighted in red indicate potential problems in data.

    • Right-click a schema and select Overview analysis to analyze the content of the selected item.

    • Right-click a table or a view and select Table analysis to create a table analysis on the selected item. You can also view the keys and indexes of a selected table. For further information, see Displaying keys and indexes of database tables.

    • Click any column header in the analytical table to sort the listed data alphabetically.

    Possible actions from the Statistical information section.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!