Matching primary and foreign keys - 6.3

Talend MDM Platform Studio User Guide

EnrichVersion
6.3
EnrichProdName
Talend MDM Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

You can create an analysis that matches foreign keys in one table to primary keys in the other table and vice versa. This redundancy analysis supports only database tables.

Prerequisite(s): At least one database connection is set in the Profiling perspective of the studio. For further information, see Connecting to a database.

To match primary and foreign keys in tables, do the following:

Defining the analysis

  1. In the DQ Repository tree view, expand the Data Profiling folder.

  2. Right-click the Analyses folder and select New Analysis.

    The [Create New Analysis] wizard opens.

  3. In the filter field, start typing redundancy analysis and then select Redundancy Analysis, click Next.

  4. In the Name field, enter a name for the current analysis.

    Note

    Avoid using special characters in the item names including:

    "~", "!", "`", "#", "^", "&", "*", "\\", "/", "?", ":", ";", "\"", ".", "(", ")", "'", "¥", "'", """, "«", "»", "<", ">".

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  5. Set the analysis metadata (purpose, description and author name) in the corresponding fields and then click Finish.

    A file for the newly created analysis is displayed under the Analysis folder in the DQ Repository tree view. The analysis editor opens with the defined analysis metadata.

Selecting the primary and foreign keys

  1. Click Analyzed Column Sets to display the corresponding view.

    In this example, you want to match the foreign keys in the customer_id column of the sales_fact_1998 table with the primary keys in the customer_id column of the customer table, and vice versa. This will explore the relationship between the two tables to show us for example if every customer has an order in the year 1998.

  2. From the Connection list, select the database connection relevant to the database to which you want to connect.

    You have in this list all the connections you create and centralize in the Studio repository.

  3. Click A Column Set to open the [Column Selection] dialog box.

    If you want to check the validity of the foreign keys, select the column holding the foreign keys for the A set and the column holding the primary keys for the B set.

  4. Browse the catalogs/schemas in your database connection to reach the table holding the column you want to match. In this example, the column to be analyzed is customer_id that holds the foreign keys.

    You can filter the table or column lists by typing the desired text in the Table filter or Column filter fields respectively. The lists will show only the tables/columns that correspond to the text you type in.

  5. Click the table name to display all its columns in the right-hand panel of the [Column Selection] dialog box.

  6. In the list to the right, select the check box of the column holding the foreign keys and then click OK to proceed to the next step.

    You can drag the columns to be analyzed directly from the DQ Repository tree view to the editor.

    If you right-click any of the listed columns in the Analyzed Columns view and select Show in DQ Repository view, the selected column will be automatically located under the corresponding connection in the tree view.

  7. Click B Column Set and follow the same steps to select the column holding the primary keys or drag it from the DQ Repository to the right column panel.

    If you select the Compute only number of rows not in B check box, you will look for any missing primary keys in the column in the B set.

  8. Click Data Filter in the analysis editor to display the view where you can set a filter on each of the analyzed columns.

  9. Press F6 to execute this key-matching analysis.

    A confirmation message is displayed.

  10. Click OK in the message if you want to continue the operation.

    The execution of this type of analysis may takes some time. Wait till the Analysis Results view opens automatically showing the analysis results.

    In this example, every foreign key in the sales_fact_1998 table is identified with a primary key in the customer table. However, 98.22% of the primary keys in the customer table could not be identified with foreign keys in the sales_fact_1998 table. These primary keys are for the customers who did not order anything in 1998.

Through this view, you can also access the actual analyzed data via the data explorer.

To access the analyzed data rows, right-click any of the lines in the table and select:

Option

To...

View match rows

access a list of all rows that could be matched in the two identical column sets

View not match rows

access a list of all rows that could not be matched in the two identical column sets

View rows

access a list of all rows in the two identical column sets

Warning

The data explorer does not support connections which has empty user name, such as Single sign-on of MS SQL Server. If you analyze data using such connection and you try to view data rows in the Data Explorer perspective, a warning message prompt you to set your connection credentials to the SQL Server.

The figure below illustrates in the data explorer the list of all analyzed rows in the two columns.

From the SQL editor, you can save the executed query and list it under the Libraries > Source Files folders in the DQ Repository tree view if you click the save icon on the editor toolbar. For more information, see Saving the queries executed on indicators.

For more information about the data explorer Graphical User Interface, see Main window of the data explorer.