Big Data Platform
Data Fabric
Data Management Platform
Data Services Platform
MDM Platform
Real-Time Big Data Platform
The data quality data mart contains the analyses and reports executed in Talend Studio. The data is stored as a star schema, which consists of fact tables and a number of associated dimension tables.
You can use the Physical Data Model (PDM) of Talend Data Quality to create your own specified reports with JasperReports reporting tool and use them when creating user-specified reports in Talend Studio.
You may also connect this data mart to your own reporting tools, such as Tableau Software, and find the data quality information in your own business intelligence environment.
The physical design of Talend Data Quality includes fact and dimension tables.
Fact tables:
TDQ_INDICATOR_VALUE
: indicator valueTDQ_OVERVIEW_INDVALUE
: overview analysesTDQ_MATCH_INDVALUE
: comparison analysesTDQ_SET_INDVALUE
: column set analysesTDQ_MATCHING_INDVALUE
: match analysesTDQ_GROUP_STATISTICS
: table storing the group statistics of the match analysisTDQ_BLOCKING_KEY
: table storing the blocking key definition of the match analysisTDQ_MATCHING_KEY
: table storing the matching key definition of the match analysis
Fact tables may contain columns that have the following values: NULL
(TALEND), N/A
(TDQ) and EMPTY
(TDQ). The NULL
(TALEND) value indicates that the analyzed data is null. The N/A
(TDQ) value
indicates that there is no meaning to have a value in the column in the data quality
context. The EMPTY
(TDQ) value indicates that the analyzed data is empty (an
empty string is different from a null value in most databases).
Dimension tables:
-
TDQ_ANALYSIS
: the analysis instance in a report (meaning that the pair of the report and analysis ids forms the functional key).
As dimension tables have data that slowly changes, historical data is tracked by creating multiple records in the dimensional tables with separate keys. New records are inserted each time a change is made. For more information, see http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2.
Dimensional tables may contain columns that have the following values: NULL
(TALEND), N/A
(TDQ) and EMPTY
(TDQ). The NULL
(TALEND) value indicates that the analyzed data is null. The N/A
(TDQ) value
indicates that there is no meaning to have a value in the column in the data quality
context. The EMPTY
(TDQ) value indicates that the analyzed data is empty (an
empty string is different from a null value in most databases).
The figure below shows the physical design of the PDM of Talend Data Quality. It also shows tables inter connectivity.
The three figures that follow draw parts of the PDM concerning the comparison analyses, the overview analyses and the analyses of a set of columns.