Skip to main content Skip to complementary content

Matching primary and foreign keys

You can create an analysis that matches foreign keys in one table to primary keys in the other table and vice versa. This redundancy analysis supports only database tables.

Before you begin

A database connection is created in the Profiling perspective.

Through this view, you can also access the actual analyzed data via the Data Explorer.

To access the analyzed data rows, right-click any of the lines in the table and select an option:
  • View match rows: access a list of all rows that could be matched in the two identical column sets.
  • View not match rows: access a list of all rows that could not be matched in the two identical column sets.
  • View rows: access a list of all rows in the two identical column sets
Information noteWarning: The data explorer does not support connections which has empty username, such as Single sign-on of MS SQL Server. If you analyze data using such connection and you try to view data rows in the Data Explorer perspective, a warning message prompt you to set your connection credentials to the SQL Server.

The figure below illustrates in the data explorer the list of all analyzed rows in the two columns.

Example of a list of analyzed rows in two columns.

From the SQL editor, you can save the executed query and list it under the Libraries > Source Files folders in the DQ Repository tree view if you click the save icon on the editor toolbar. For more information, see Saving the queries executed on indicators.

Defining the analysis to match primary and foreign keys in tables

Procedure

  1. In the DQ Repository tree view, expand the Data Profiling folder.
  2. Right-click the Analyses folder and select New Analysis.
    Contextual menu of the Analyses node.
    The Create New Analysis wizard opens.
  3. In the filter field, start typing redundancy analysis and then select Redundancy Analysis, click Next.
  4. In the Name field, enter a name for the current analysis.
    Information noteImportant:

    Do not use the following special characters in the item names: ~ ! ` # ^ * & \\ / ? : ; \ , . ( ) ¥ ' " « » < >

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  5. Set the analysis metadata (purpose, description, and author name) in the corresponding fields and click Finish.
    New analysis window when the Name, Purpose, and Description fields are filled in.
    A file for the newly created analysis is displayed under the Analysis folder in the DQ Repository tree view. The analysis editor opens with the defined analysis metadata.
    Overview of the Analysis Metadata section in the Analysis Settings tab.

Selecting the primary and foreign keys

Procedure

  1. Click Analyzed Column Sets to display the corresponding view.
    In this example, you want to match the foreign keys in the customer_id column of the sales_fact_1998 table with the primary keys in the customer_id column of the customer table, and vice versa. This will explore the relationship between the two tables to show us for example if every customer has an order in the year 1998.
    Overview of the Analyzed Column Sets section in the Analysis Settings tab.
  2. From the Connection list, select the database connection relevant to the database to which you want to connect.
    You have in this list all the connections you create and centralize in the Talend Studio repository.
  3. Click A Column Set to open the Column Selection dialog box.
    If you want to check the validity of the foreign keys, select the column holding the foreign keys for the A set and the column holding the primary keys for the B set.
  4. Browse the catalogs/schemas in your database connection to reach the table holding the column you want to match.
    In this example, the column to be analyzed is customer_id that holds the foreign keys.
    Selection of the tables to be analyzed.
    You can filter the table or column lists by typing the desired text in the Table filter or Column filter fields respectively. The lists will show only the tables/columns that correspond to the text you type in.
  5. Click the table name to display all its columns in the right-hand panel of the Column Selection dialog box.
  6. In the list to the right, select the check box of the column holding the foreign keys and then click OK to proceed to the next step.
    You can drag the columns to be analyzed directly from the DQ Repository tree view to the editor.
    If you right-click any of the listed columns in the Analyzed Columns view and select Show in DQ Repository view, the selected column will be automatically located under the corresponding connection in the tree view.
  7. Click B Column Set and follow the same steps to select the column holding the primary keys or drag it from the DQ Repository to the right column panel.
    If you select the Compute only number of rows not in B check box, you will look for any missing primary keys in the column in the B set.
  8. Click Data Filter in the analysis editor to display the view where you can set a filter on each of the analyzed columns.
  9. Press F6 to execute this key-matching analysis.
    A confirmation message is displayed.
  10. Click OK in the message if you want to continue the operation.
    The execution of this type of analysis may takes some time. Wait till the Analysis Results view opens automatically showing the analysis results.

Results

Table and graphic from the Analysis Results section.

In this example, every foreign key in the sales_fact_1998 table is identified with a primary key in the customer table. However, 98.22% of the primary keys in the customer table could not be identified with foreign keys in the sales_fact_1998 table. These primary keys are for the customers who did not order anything in 1998.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!