Before modifying data in the LEADFACT table and retrieving these modifications using the tOracleCDC component, you must activate the Redo log mode in the Oracle database that holds the table to monitor. For more information on how to activate the Redo log mode, see Talend Studio User Guide.
This scenario is based on the preceding one. It continuously populates and modifies the data stored in the LEADFACT table, and retrieves and saves, every night, these modifications in a dedicated table using the CDC function. These modifications could be then extracted by the various concerned departments.
Before being able to retrieve modified data from the LEADFACT table, you must:
first set up the DB connection dedicated to CDC,
second, set up a DB connection to the source data and identify the table to catch,
finally, set the connection between the CDC and the data.
To do that:
In the Repository tree view and under Metadata, create a connection to your database dedicated to CDC, cdc_publisher in this scenario.
Ensure that the DB connection for CDC is on the same server with the source data to which changes are to be captured.
In the Repository tree view and under Metadata, create a connection to the source data warehouse and identify the table to catch, LEADFACT in this scenario.
Right-click the connection to the orcle database and select Retrieve schema from the drop-down menu to retrieve the schema of the table to catch.
Right-click CDC Foundation of the cdc-data connection and select Create CDC in the drop-down menu.
The [Create Change Data Capture] dialog box displays
Click the three-dot button next to the Set link Connection field and select the connection that corresponds to CDC, cdc_publisher in this example.
In the Options area, select the Log mode check box.
Click Create Subscriber. The [Create Subscriber and Execute SQL Script] dialog box displays.
Click Execute and then Close to close the dialog box.
Click Finish to validate the creation of the subscriber table.
In the CDC Foundation folder, the relevant subscriber table displays.
You must specify which table the subscriber wants to subscribe to and then
Right-click the LEADFACT schema of the cdc-data connection and select Add CDC in the drop-down list. The [Create Subscriber and Execute SQL Script] dialog box displays.
For Oracle databases and in Talend Studio version 3.2 or subsequent versions, the CDC system creates an alias for the source table(s) monitored. This avoids problems relating to identifier length upon creation of the table of changes and its associated view. For CDC systems which are already in place, the table names are maintained.
Click Execute and then Close to validate the subscription.
In the CDC Foundation folder, the two created tables display and the schema node of the catched table is marked with a green CDC symbol.
Modify the data of your clients in the LEADFACT table, for example, convert all customer names to upper case.
Double-click the tMap component and enter
row1.LASTNAME.toUpperCase()in front of the CLASTNAME column to convert all customer names to upper case.
Double-click the tOracleOutput component.
In the Action on table field, select None.
In the Action on data field, select Insert or update to insert or update table data.
Save your Job and press F6 to execute it.
To view all changes done on data, right-click the LEADFACT table and select View All Changes to open the relevant dialog box.
After setting up the CDC environment, you can now design a job using the Oracle CDC component to incrementally extract the change data from the LEADFACT table. To do that:
From the Palette, drop the OracleCDC and tLogRow components to the design workspace.
Link the two components using a Row Main link.
Double-click tOracleCDC to open its Basic settings view and define its properties.
Set Property Type to Repository and then select the schema corresponding to your Oracle DB table, cdc_publisher in this scenario. The connection details will display automatically in the corresponding fields
If you have not stored the data warehouse connection details in the Metadata folder in the Repository tree view, select Built-in in the property type list and set the connection details manually.
In the Schema using CDC field, select Repository and then select the schema of the LEADFACT table stored in the Metadata folder.
In the Table using CDC field, enter the name of the table captured by the CDC, in this scenario Leadfact.
In the Events to catch field, select the check boxes corresponding to the type of the modified data the subscriber will extract. In this scenario, select the three check boxes for the three subscribers.
Double-click tLogRow to display its Basic settings view and define its properties.
Click the Sync columns button to retrieve the schema from the preceding component.
Click Edit schema to open the schema dialog box.
In the TALEND_CDC_CREATION_DATE line of the Date Pattern column, enter between brackets the desired date format:
Save your Job and press F6 to execute it.
In the Redo log mode, changes done on data are indicated in the following way: modifications are equal to first, an "update and delete" operation (UO), and then to an "update and insert" operation (UN). Thus, client data displays twice:
- First, data is deleted (UO).
-Second, data is inserted (UN).
Once these modifications are extracted, they are no more available in the modified table. To verify their extraction, right-click the LEADFACT table catched by the CDC and then select Views All Changes. The extracted changes do not display anymore.
For another CDC scenario using the Trigger mode, see Scenario 2: Retrieving modified data using CDC.