Step 2: Loading changes from the source database table into the Hive external table - 6.4

Change Data Capture

author
Talend Documentation Team
EnrichVersion
6.4
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Database components > Change Data Capture
Data Quality and Preparation > Third-party systems > Database components > Change Data Capture
Design and Development > Third-party systems > Database components > Change Data Capture
EnrichPlatform
Talend Studio
This step reads only the changes from the source database table and loads them into the Hive external table employee_extnl.

Procedure

  1. The Big Data Batch Job is as follow:
    • The source table is filtered by the last updated timestamp which is maintained in the cdc_control table. This is done by using this SQL in the Where condition of the tmysqlInput component.

      where cdc.Table_Name='employee_table' and emp.`Record_DateTime`> cdc.Last_executed"

    • The tAggregateRow loads one row per run into the cdc_control table. It does an update else insert operation on the table. If a record for the table already exists, it will update the record with the run time of the Job.

      The runtime can be set by using the TalendDate.getCurrentDate() function.

    The following shows the data in the source employee_table table after new records are added:
  2. Run the Job.
    The following shows the data in the employee_extnl external Hive table after the Job is run: