CDC architectural overview - Cloud - 7.3

Talend Studio User Guide

Version
Cloud
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-13

Data warehousing involves the extraction and transportation of data from one or more databases into a target system or systems for analysis. But this involves the extraction and transportation of huge volumes of data and is very expensive in both resources and time.

The ability to capture only the changed source data and to move it from a source to a target system(s) in real time is known as Change Data Capture (CDC). Capturing changes reduces traffic across a network and thus helps reduce ETL time.

The CDC feature, introduced in Talend Studio, simplifies the process of identifying the change data since the last extraction. CDC in Talend Studio quickly identifies and captures data that has been added to, updated in, or removed from database tables and makes this change data available for future use by applications or individuals. The CDC feature is available for Oracle, MySQL, DB2, PostgreSQL, Sybase, MS SQL Server, Informix, Ingres, Teradata, and AS/400.

Warning: The CDC feature works only with database systems running on the same server.

Three different CDC modes are available in Talend Studio:

  • Trigger: this mode is the by-default mode used by CDC components.

  • Redo/Archive log: this mode is used with Oracle v11 and previous versions and AS/400.

  • XStream: this mode is used only with Oracle v12 with OCI.

For detailed information on these three modes, see the following sections.