Data quality data mart - 6.3

Talend MDM Platform Studio User Guide

EnrichVersion
6.3
EnrichProdName
Talend MDM Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The data quality data mart contains the analyses and reports executed in Talend Studio. The data is stored as a star schema, which consists of fact tables and a number of associated dimension tables.

The data quality data mart makes it easier to access the analyses and reports data for historical reporting. To share data quality reports with other teams or business users, you can connect to Talend Data Quality Portal that plugs to the data quality data mart.

You can use the Physical Data Model (PDM) of Talend Data Quality to create your own specified reports with JasperReports reporting tool and use them when creating user-specified reports in Talend Studio.

You may also connect this data mart to your own reporting tools, such as Tableau Software, and find the data quality information in your own Business Intelligence environment.

The physical design of Talend Data Quality includes fact and dimension tables.

Fact tables:

  • TDQ_INDICATOR_VALUE: indicator value.

  • TDQ_OVERVIEW_INDVALUE: overview analyses.

  • TDQ_MATCH_INDVALUE: comparison analyses.

  • TDQ_SET_INDVALUE: column set analyses.

  • TDQ_MATCHING_INDVALUE: match analyses.

  • TDQ_GROUP_STATISTICS: table to store group statistics.

Fact tables may contain columns that have the following values: NULL (TALEND), N/A (TDQ) and EMPTY (TDQ). The NULL (TALEND) value indicates that the analyzed data is null. The N/A (TDQ) value indicates that there is no meaning to have a value in the column in the data quality context. The EMPTY (TDQ) value indicates that the analyzed data is empty (an empty string is different from a null value in most databases).

Dimension tables:

  • TDQ_ANALYSIS: the analysis instance in a report (meaning that the pair of the report and analysis ids forms the functional key).

As dimension tables have data that slowly changes, historical data is tracked by creating multiple records in the dimensional tables with separate keys. New records are inserted each time a change is made. For more information, see http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2.

Dimensional tables may contain columns that have the following values: NULL (TALEND), N/A (TDQ) and EMPTY (TDQ). The NULL (TALEND) value indicates that the analyzed data is null. The N/A (TDQ) value indicates that there is no meaning to have a value in the column in the data quality context. The EMPTY (TDQ) value indicates that the analyzed data is empty (an empty string is different from a null value in most databases).

The figure below shows the physical design of the PDM of Talend Data Quality. It also shows tables inter connectivity.

The three figures that follow draw parts of the PDM concerning the comparison analyses, the overview analyses and the analyses of a set of columns.