Scenario 2: Retrieving modified data using Oracle CDC Redo log mode - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Warning

Before modifying data in the LEADFACT table and retrieving these modifications using the tOracleCDC component, you must activate the Redo log mode in the Oracle database that holds the table to monitor. For more information on how to activate the Redo log mode, see Talend Studio User Guide.

This scenario is based on the preceding one. It continuously populates and modifies the data stored in the LEADFACT table, and retrieves and saves, every night, these modifications in a dedicated table using the CDC function. These modifications could be then extracted by the various concerned departments.

Step 1: Configuring CDC

Before being able to retrieve modified data from the LEADFACT table, you must:

  1. first set up the DB connection dedicated to CDC,

  2. second, set up a DB connection to the source data and identify the table to catch,

  3. finally, set the connection between the CDC and the data.

    To do that:

  4. In the Repository tree view and under Metadata, create a connection to your database dedicated to CDC, cdc_publisher in this scenario.

    Note

    Ensure that the DB connection for CDC is on the same server with the source data to which changes are to be captured.

  5. In the Repository tree view and under Metadata, create a connection to the source data warehouse and identify the table to catch, LEADFACT in this scenario.

  6. Right-click the connection to the orcle database and select Retrieve schema from the drop-down menu to retrieve the schema of the table to catch.

  7. Right-click CDC Foundation of the cdc-data connection and select Create CDC in the drop-down menu.

    The [Create Change Data Capture] dialog box displays

  8. Click the three-dot button next to the Set link Connection field and select the connection that corresponds to CDC, cdc_publisher in this example.

  9. In the Options area, select the Log mode check box.

  10. Click Create Subscriber. The [Create Subscriber and Execute SQL Script] dialog box displays.

  11. Click Execute and then Close to close the dialog box.

  12. Click Finish to validate the creation of the subscriber table.

    In the CDC Foundation folder, the relevant subscriber table displays.

    You must specify which table the subscriber wants to subscribe to and then

  13. Right-click the LEADFACT schema of the cdc-data connection and select Add CDC in the drop-down list. The [Create Subscriber and Execute SQL Script] dialog box displays.

    Note

    For Oracle databases and in Talend Studio version 3.2 or subsequent versions, the CDC system creates an alias for the source table(s) monitored. This avoids problems relating to identifier length upon creation of the table of changes and its associated view. For CDC systems which are already in place, the table names are maintained.

  14. Click Execute and then Close to validate the subscription.

    In the CDC Foundation folder, the two created tables display and the schema node of the catched table is marked with a green CDC symbol.

Step 2: Modifying the data

Modify the data of your clients in the LEADFACT table, for example, convert all customer names to upper case.

  1. Double-click the tMap component and enter row1.LASTNAME.toUpperCase()in front of the CLASTNAME column to convert all customer names to upper case.

  2. Click Ok.

  3. Double-click the tOracleOutput component.

  4. In the Action on table field, select None.

  5. In the Action on data field, select Insert or update to insert or update table data.

  6. Save your Job and press F6 to execute it.

To view all changes done on data, right-click the LEADFACT table and select View All Changes to open the relevant dialog box.

Step 3: Extracting change data

After setting up the CDC environment, you can now design a job using the Oracle CDC component to incrementally extract the change data from the LEADFACT table. To do that:

  1. From the Palette, drop the OracleCDC and tLogRow components to the design workspace.

  2. Link the two components using a Row Main link.

  3. Double-click tOracleCDC to open its Basic settings view and define its properties.

  4. Set Property Type to Repository and then select the schema corresponding to your Oracle DB table, cdc_publisher in this scenario. The connection details will display automatically in the corresponding fields

    Note

    If you have not stored the data warehouse connection details in the Metadata folder in the Repository tree view, select Built-in in the property type list and set the connection details manually.

  5. In the Schema using CDC field, select Repository and then select the schema of the LEADFACT table stored in the Metadata folder.

  6. In the Table using CDC field, enter the name of the table captured by the CDC, in this scenario Leadfact.

  7. In the Events to catch field, select the check boxes corresponding to the type of the modified data the subscriber will extract. In this scenario, select the three check boxes for the three subscribers.

  8. Double-click tLogRow to display its Basic settings view and define its properties.

  9. Click the Sync columns button to retrieve the schema from the preceding component.

  10. Click Edit schema to open the schema dialog box.

  11. In the TALEND_CDC_CREATION_DATE line of the Date Pattern column, enter between brackets the desired date format: "yyyy-MM-dd".

  12. Save your Job and press F6 to execute it.

In the Redo log mode, changes done on data are indicated in the following way: modifications are equal to first, an "update and delete" operation (UO), and then to an "update and insert" operation (UN). Thus, client data displays twice:

- First, data is deleted (UO).

-Second, data is inserted (UN).

Once these modifications are extracted, they are no more available in the modified table. To verify their extraction, right-click the LEADFACT table catched by the CDC and then select Views All Changes. The extracted changes do not display anymore.

For another CDC scenario using the Trigger mode, see Scenario 2: Retrieving modified data using CDC.