How to set up CDC in Trigger mode - 6.5

Talend Big Data Studio User Guide

EnrichVersion
6.5
EnrichProdName
Talend Big Data
task
Design and Development
EnrichPlatform
Talend Studio

The following two sections provide a two-step guide to set up the CDC environment in Trigger mode in Talend Studio: the first step explains how to configure your system for CDC and the second step explains how to extract the modified data.

How to configure CDC in Trigger mode

Below are configuration steps that need to be set up just once for a given publisher/subscriber scenario.

Step 1: Set up a publisher

To set up a publisher, you need to:

  1. Set up a database connection dedicated to CDC.

  2. Set up a connection to the database where data is located.

For more information about how to set up a database connection, see Centralizing database metadata.

Note

If you work with an MS SQL Server, you must set the two connections to the same database but using two different schemas.

Step 2: Identify the source table

To identify the table from which data changes will be captured, right-click the newly created data connection to retrieve the schema of the source table and load it on your repository file system. In this example, the source table is person.

Step 3: Create the subscriber(s) table

To set up the connection between the CDC and the data:

  1. Right-click the CDC Foundation folder under the data connection node and select Create CDC from the contextual menu. The [Create Change Data Capture] dialog box opens up.

  2. In the [Create Change Data Capture] dialog box, click the [...] button next to the Set Link Connection field to select the database connection dedicated to CDC.

    Note that for the database, such as Oracle, which also supports other CDC mode, make sure to select Trigger mode as the option to capture data changes in this step.

  3. Click Create Subscriber and the [Create Subscriber and Execute SQL Script] dialog box opens up.

  4. Click Execute to run the SQL script displayed and then click Close to close the dialog box.

  5. Click Finish in the [Create Change Data Capture] dialog box.

    In the CDC Foundation folder, the CDC database connection and the subscriber table schema appear.

Step 4: Subscribe to the source table and activate the subscription

You must specify the table that the subscriber wants to subscribe to and then activate the subscription.

  1. Right-click the relevant schema of the source table and select add CDC. The [Create Subscriber and Execute SQL Script] dialog box appears.

    Warning

    The source table to be monitored should have a primary key so that the CDC system can identify the rows on which changes have been made. You cannot set up a CDC environment if your source table schema does not have a primary key.

    Note

    For Oracle databases, the CDC system creates an alias for the source table(s) monitored. This helps to avoid problems due to the length of identifiers upon creation of the change table and its associated view. For CDC systems which are already set up, the table names are retained.

  2. In the [Create Subscriber and Execute SQL Script] dialog box, check the event(s) you want to catch: Insert, Update or Delete.

  3. Click Execute to run the SQL script displayed and then click Close to close the dialog box.

    In the CDC Foundation folder, the catch table schemas and the corresponding view schemas appear.

  4. To view any data changes made to the source table, right-click the table in the Table schemas folder and select View All Changes to open the [View All Changes] dialog box.

  5. Click Finish to close the dialog box.

How to extract change data in Trigger mode

After setting up the CDC environment, you can design a Job in Talend Studio using the CDC component which corresponds to the type of the database being used, in order that changes made to the data from the source system can be extracted. In this example, the tTeradataCDC component will be used to show how to extract data changes made to the source table person. For further information on the properties and the parameters of the tTeradataCDC component, see the documentation of the component at https://help.talend.com.

  1. Create a new Job in Talend Studio, add a tTeradataCDC component and a tLogRow component, and link tTeradataCDC to tLogRow using a Row > Main connection.

  2. Double-click tTeradataCDC to open its Basic settings view.

  3. Select Repository from the Property of the CDC connection drop-down list and click the [...] button next to the field to retrieve the schema that corresponds to the database connection dedicated to CDC.

  4. Select Repository from the Schema using CDC drop-down list and click the [...] button next to the field to retrieve the schema that corresponds to the table from which changes will be captured.

  5. Select the check box(es) for the event(s) to be monitored.

  6. Double-click tLogRow and in the Mode area on its Basic settings view select Table (print values in cells of a table) for a better display of the result.

  7. Press Ctrl + S to save the Job and then F6 to execute it.

    On the console, you can read the output results which correspond to what you can see in the [View All Changes] dialog box.