Data quality data mart - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

The data quality data mart contains the analyses and reports executed in Talend Studio. The data is stored as a star schema, which consists of fact tables and a number of associated dimension tables.

You can use the Physical Data Model (PDM) of Talend Data Quality to create your own specified reports with JasperReports reporting tool and use them when creating user-specified reports in Talend Studio.

You may also connect this data mart to your own reporting tools, such as Tableau Software, and find the data quality information in your own business intelligence environment.

The physical design of Talend Data Quality includes fact and dimension tables.

Fact tables:

  • TDQ_INDICATOR_VALUE: indicator value
  • TDQ_OVERVIEW_INDVALUE: overview analyses
  • TDQ_MATCH_INDVALUE: comparison analyses
  • TDQ_SET_INDVALUE: column set analyses
  • TDQ_MATCHING_INDVALUE: match analyses
  • TDQ_GROUP_STATISTICS: table storing the group statistics of the match analysis
  • TDQ_BLOCKING_KEY: table storing the blocking key definition of the match analysis
  • TDQ_MATCHING_KEY: table storing the matching key definition of the match analysis

Fact tables may contain columns that have the following values: NULL (TALEND), N/A (TDQ) and EMPTY (TDQ). The NULL (TALEND) value indicates that the analyzed data is null. The N/A (TDQ) value indicates that there is no meaning to have a value in the column in the data quality context. The EMPTY (TDQ) value indicates that the analyzed data is empty (an empty string is different from a null value in most databases).

Dimension tables:

  • TDQ_ANALYSIS: the analysis instance in a report (meaning that the pair of the report and analysis ids forms the functional key).

As dimension tables have data that slowly changes, historical data is tracked by creating multiple records in the dimensional tables with separate keys. New records are inserted each time a change is made. For more information, see Slowly changing dimension.

Dimensional tables may contain columns that have the following values: NULL (TALEND), N/A (TDQ) and EMPTY (TDQ). The NULL (TALEND) value indicates that the analyzed data is null. The N/A (TDQ) value indicates that there is no meaning to have a value in the column in the data quality context. The EMPTY (TDQ) value indicates that the analyzed data is empty (an empty string is different from a null value in most databases).

The figure below shows the physical design of the PDM of Talend Data Quality. It also shows tables inter connectivity.

The three figures that follow draw parts of the PDM concerning the comparison analyses, the overview analyses and the analyses of a set of columns.

Physical Data Model of Talend Data Quality.
Comparison analyses.
Overview analyses.
Analyses of a set of columns.