Implemented PDM - 6.1

Talend Data Fabric Studio User Guide

EnrichVersion
6.1
EnrichProdName
Talend Data Fabric
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

You can use the Physical Data Model (PDM) of Talend Data Quality to create your own specified reports with JasperReports reporting tool and use them when creating user-specified reports in the studio.

The physical design of Talend Data Quality includes fact and dimension tables.

Fact tables:

  • TDQ_INDICATOR_VALUE: indicator value.

  • TDQ_OVERVIEW_INDVALUE: overview analyses.

  • TDQ_MATCH_INDVALUE: comparison analyses.

  • TDQ_SET_INDVALUE: column set analyses.

Fact tables may contain columns that have the following values: NULL (TALEND), N/A (TDQ) and EMPTY (TDQ). The NULL (TALEND) value indicates that the analyzed data is null. The N/A (TDQ) value indicates that there is no meaning to have a value in the column in the data quality context. The EMPTY (TDQ) value indicates that the analyzed data is empty (an empty string is different from a null value in most databases).

Dimension tables:

  • TDQ_ANALYSIS: the analysis instance in a report (meaning that the pair of the report and analysis ids forms the functional key).

  • TDQ_INDICATOR_DEFINITION: indicator definition (Row count, Frequency table...).

  • TDQ_ANALYZED_ELEMENT: analyzed element (mainly a column).

  • TDQ_DAY_TIME: day time dimension (hours, minutes).

    The time data is stored in UTC (Coordinated Universal Time).

  • TDQ_VALUE: the table listing the value when the frequency table indicator is computed.

  • TDQ_INDICATOR_OPTIONS: options used by indicators.

  • TDQ_CALENDAR: date dimension.

  • TDQ_ANALYZED_SET:the mapping table between the indicator and the analyzed element sets, column comparison analyses.

  • TDQ_INDICATOR_Value: indicator value.

  • TDQ_MATCH_INDValue: fact table of the comparison analyses.

  • TDQ_OVERVIEW_INDValue: fact table for table, schema, catalog overview indicator.

  • TDQ_PRODUCT: information about the used TDQ platform.

  • TDQ_SET_INDValue: fact table of the indicators measuring a set of columns.

  • TDQ_TABLE_ANALYZED_SET: relation table for TDQ_ANALYZED_ELEMENT and TDQ_SET_INDValue.

As dimensions tables have data that slowly changes, historical data is tracked by creating multiple records in the dimensional tables with separate keys. New records are inserted each time a change is made. For more information, see http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2.

Dimensional tables may contain columns that have the following values: NULL (TALEND), N/A (TDQ) and EMPTY (TDQ). The NULL (TALEND) value indicates that the analyzed data is null. The N/A (TDQ) value indicates that there is no meaning to have a value in the column in the data quality context. The EMPTY (TDQ) value indicates that the analyzed data is empty (an empty string is different from a null value in most databases).

The figure below shows the physical design of the PDM of Talend Data Quality. It also shows tables inter connectivity.

The three figures that follow draw parts of the PDM concerning the comparison analyses, the overview analyses and the analyses of a set of columns.