How to set up CDC in Oracle Redo/Archive log mode - 6.1

Talend Big Data Studio User Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
task
Design and Development
EnrichPlatform
Talend Studio

The Oracle Redo/Archive log mode used in Talend is equivalent to the archive log mode of Oracle.

The following three sections detail the prerequisites for using CDC in Redo/Archive log mode for Oracle databases and provide a two-step example of how to set up a CDC environment using the Oracle Redo/Archive log mode in Talend Studio: the first step explains how to configure your system for CDC and the second, how to extract the modified data.

Prerequisites for the Oracle Redo/Archive log mode

The CDC feature uses Java. Therefore, make sure Java is enabled when you install Oracle database.

The CDC feature in this mode for Oracle is available for the 10g version of the Oracle database and later versions. Before being able to use CDC in Redo/Archive log mode in Talend Studio, the administrator of the database to be supervised should do the following:

  1. Activate the active log mode in the Oracle database.

  2. Set up CDC in the Oracle database.

  3. Create and give all rights to the source user.

  4. Create and give all rights to the publisher.

Activate the archive log mode in Oracle

To do so, connect to the Oracle database as an administrator and activate the active log mode using the following queries:

connect / as sysdba;
shutdown;
startup exclusive mount;
alter database archivelog;
alter database open;

Set up CDC in Oracle

To do so, create a tablespace for the source user and the publisher respectively, then create a source user and give it all the rights necessary to make modifications, and create a publisher and give it all the rights necessary to capture and publish modifications.

In the example below, the $ORACLE_PATH varies depending on where Oracle is installed. The source user is called source, and the publisher is called publisher:

create tablespace SOURCE datafile '$ORACLE_PATH/oradata/Oracle/SOURCE.dbf' size 50M;

create user source
identified by source
default tablespace SOURCE 
quota unlimited on SOURCE
quota unlimited on SYSTEM
quota unlimited on SYSAUX;

grant connect, create table to source;
grant create tablespace to source;
grant unlimited tablespace to source;
grant select_catalog_role to source;
grant execute_catalog_role to source;
grant create sequence to source;
grant create session to source;
grant dba to source;
grant execute on SYS.DBMS_CDC_PUBLISH to source;
alter user source quota unlimited on users;

create tablespace PUBLISHER datafile '$ORACLE_PATH/oradata/Oracle/PUBLISHER.dbf' size 50M;

create user publisher
identified by publisher
default tablespace PUBLISHER
quota unlimited on PUBLISHER
quota unlimited on SYSTEM
quota unlimited on SYSAUX;

grant connect, create table to publisher;
grant create tablespace to publisher;
grant unlimited tablespace to publisher;
grant select_catalog_role to publisher;
grant execute_catalog_role to publisher;
grant create sequence to publisher;
grant create session to publisher;
grant dba to publisher;
grant execute on SYS.DBMS_CDC_PUBLISH to publisher;
execute DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(GRANTEE=>'publisher');

The select_catalog_role procedure allows the publisher to consult all Oracle dictionaries.

The execute_catalog_role procedure allows the publisher to execute the dictionary procedures.

The SYS.DBMS_CDC_PUBLISH procedure allows the publisher to configure the CDC system that will capture and publish change data in one or more source tables.

The procedure: DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(GRANTEE=>'publisher')gives the user the administration privileges necessary to carry out data replication (stream). The GRANT_ADMIN_PRIVILEGE procedure allows the user to carry out data capture and propagation operations.

How to configure CDC in Oracle Redo/Archive log mode

The followings are configuration steps that need to be set up once for a given publisher/subscriber scenario.

Step 1: Set up a publisher

To set up a publisher, you need to:

  1. Create a new Job in Talend Studio.

  2. Set a DB connection dedicated to CDC by using the "publisher" user that has all necessary rights.

  3. Set a DB connection to the database you want to supervise.

Step 2: Identify the source table

To identify the table(s) to catch, right-click the DB connection for the database you want to monitor and select Retrieve Schema, then proceed to retrieve and load the source table schema in the repository.

In this example, the source table is client, which contains three columns id, name and age.

Step 3: Retrieve and process changes

To retrieve modified data, define the connection between CDC and data:

  1. Right-click the relevant CDC Foundation folder and proceed to connect to the Oracle database to be monitored.

  2. Select Create CDC to open the [Create Change Data Capture] dialog box.

  3. Click the three-dot button next to the Set Link Connection field to select the connection that corresponds to CDC. Then define the user for Oracle - publisher in this example. This user will create the change tables that store modifications and will activate change captures for the source table.

  4. In the Options area, select Log mode as the option for capturing changes.

  5. Click Create Subscriber. The [Create Subscriber and Execute SQL Script] dialog box appears.

  6. Click Execute and then Close to close the dialog box.

  7. Click Finish in the [Create Change Data Capture] dialog box.

In the CDC Foundation folder, the subscription table schema appears. An icon also appears to show that the change capture for the source table is activated.

Step 4: Create the change table, subscribe to the source table and activate the subscription

You must specify the table to which the subscriber wants to subscribe and then activate its subscription.

  1. Right-click the schema that corresponds to the source table and select Add CDC. The [Create Subscriber and Execute SQL Script] dialog box appears.

    Note

    For Oracle databases and for versions 3.2 + of Talend Studio, the CDC system creates an alias for the source table(s) monitored. This helps to avoid problems due to the length of identifiers upon creation of the change table and its associated view. For CDC systems which are already set up, the table names are retained.

  2. Click Execute to activate the subscription to the source table and then click Close to close the dialog box.

    In the CDC Foundation folder, the table that holds the modified data and the associated view schemas appear.

  3. To see the changes made to data, right-click the corresponding table in the Table schemas folder and select View All Changes to open the corresponding dialog box.

    The TALEND_CDC_TYPE column of the [View All Changes] dialog box indicates all of the different changes caught.

    The changes are caught as follows: I indicates that the data has been inserted, UN indicates that the data has been updated, and D indicates that the data has been deleted.

    The columns of the source table and their values are also displayed.

  4. Click Finish to close the dialog box.

How to extract change data modified in Oracle Redo/Archive log mode

After setting up the CDC environment, you can now design a Job in Talend Studio using the CDC Oracle component to extract the change data from the source system.

For an example of how to use a CDC component and for more information on the properties and the parameters of the tOracleCDC component, see Talend Components Reference Guide.

If you want to use CDC with an Oracle database, proceed as below:

  1. From the Repository tree view, drop the source table to the design workspace and select the tOracleCDC component in the [Components] dialog box, drop tLogRow from the Palette to the design workspace, and link the two components together using a Row Main connection.

  2. Double-click tOracleCDC to display its Basic settings.

    The Property type is set to Repository since we used the connection information related to CDC stored locally in the Repository tree view. All connection fields are automatically filled in.

    In the Schema using CDC, Repository is selected and this way the schema corresponding to Oracle source table is automatically retrieved.

    The name of the source table that holds change data appears in the Table using CDC field. In this example, the table is called CLIENT.

    Note

    The CDC Log Mode check box is selected since you select this mode when setting up the CDC environment.

  3. For the Events to catch option, select the check box corresponding to the event(s) you want to catch. In this example, we want to catch the three events, Insert, Update and Delete.

  4. Save your Job and press F6 to execute it.

In the console, you can read the output results that correspond to what you can see in the [View All Changes] dialog box.