Tracking data changes using Slowly Changing Dimensions (type 0 through type 3)

SCD

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Data Integration
Talend Open Studio for Big Data
Talend Data Fabric
Talend Big Data Platform
Talend Open Studio for Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend ESB
task
Design and Development > Third-party systems > Business Intelligence components > SCD components
Data Governance > Third-party systems > Business Intelligence components > SCD components
Data Quality and Preparation > Third-party systems > Business Intelligence components > SCD components
EnrichPlatform
Talend Studio

This scenario describes a Job that stores and manages both the current and historical employee data in a MySQL table using SCD (Slowly Changing Dimensions).

For more technologies supported by Talend, see Talend components.

The input data contains various employee details including name, age, role, and salary, and another id column is added to help ensuring the unicity of the input data.

At first, the following employee data is inserted to a new MySQL table using SCD:

id;name;age;role;salary
1;Mark Smith;30;tester;11000.00
2;Thomas Johnson;32;developer;12000.00
3;Teddy Brown;33;tester;13000.00

Then the table is updated using SCD with the following renewed employee data.

id;name;age;role;salary
1;Mark Smith;31;tester;11000.00
2;Thomas Johnson;32;developer;12000.00
3;Teddy Brown;33;writer;13500.00

You can see the age of Mark Smith is updated from 30 to 31, the role of Teddy Brown is changed from tester to writer, and his salary is raised from 13000.00 to 13500.00. In this scenario,

  • we don't want to track the data changes for the name field, so we will perform Type 0 SCD on it,

  • we want the new age data to overwrite the existing data, so we will perform Type 1 SCD on it,

  • we want to retain the full history of the role data, and always create a new record with the changed data and close the previous record, so we will perform Type 2 SCD on it,

  • we want to keep the current and previous dimension values for the salary field, so we will perform Type 3 SCD on it.

For more information about SCD types, see SCD management methodology.