Scenario 2: Retrieving modified data using CDC - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario is based on the preceding one. It continuously populates and modifies the data stored in the CRM warehouse, and retrieves and saves, every night, these modifications in a dedicated table using the CDC function. These modifications could be then extracted by the various concerned departments.

Configuring CDC

Before being able to retrieve modified data from the CRM data warehouse, you must:

  1. Set up the database connection dedicated to CDC,

  2. Set up a database connection to the source data and identify the table to catch,

  3. Set the connection between the CDC and the data.

Create connections and subscribers

  1. In the Repository tree view and under Metadata, create a connection to your database dedicated to CDC, in this scenario CDC_connection.

    Note

    Ensure that the database connection for CDC is on the same server with the source data to which changes are to be captured.

  2. In the Repository tree view and under Metadata, create a connection to the source data warehouse and identify the table to catch, in this scenario CRM_connection.

  3. Right-click the CRM connection and select Retrieve schema from the drop-down menu to retrieve the schema of the table to catch.

  4. Right-click CDC Foundation of CRM and select Create CDC in the drop-down menu.

    The [Create Change Data Capture] dialog box displays

  5. In the Set link Connection field, select CDC_connection.

  6. Click Create Subscriber. The [Create Subscriber and Execute SQL Script] dialog box displays.

  7. Click Execute and then Close.

  8. Click Finish to validate the creation of the subscriber table.

    In the CDC Foundation folder, the relevant subscriber table displays.

Specify which table the subscriber wants to subscribe to and then activate the subscription

  1. Right-click the Leafact schema in the source CRM and select Add CDC in the drop-down list. The [Create Subscriber and Execute SQL Script] dialog box displays.

  2. In the Events to catch check boxes, select Insert, Update and Delete to catch inserted, updated or deleted data.

  3. In the Subscriber Name field, enter the name of the subscriber that will have access to the modifications, in this scenario Sub_Mktg for the Marketing department.

  4. Click Execute and then Close to validate the subscription.

    In the CDC Foundation folder, the two created tables display and the schema node of the catched table is marked with a green CDC symbol.

Create the new subscribers Sub_Finance and Sub_Sales for the Treasury and Sales departments respectively

  1. Right-click Leadfact and select Edit CDC Subscribers from the drop-down list. The [Edit CDC] dialog box displays.

  2. Click Add. The [Input subscriber name] dialog box displays.

  3. Enter the name of the subscriber, in this scenario Sub_Finance and Sub_Sales.

  4. Click Execute and then Close to validate the creation operation.

Modifying the CRM data

Modify the data of your customers in your CRM, for example, convert all customer names to upper case.

  1. Double-click the tMap component and enter row1.CustomerName.toUpperCase()in front of the CustomerName column to convert all customer names to upper case.

  2. Click Ok.

  3. Double-click the tMysqlOutput component.

  4. In the Action on table field, select None.

  5. In the Action on data field, select Insert or update to insert or update table data.

  6. Save your job and press F6 to execute the job.

To view all changes done on data, right-click the Leadfact table and select View All Changes to open the relevant dialog box.

Extracting change data

After setting up the CDC environment, you can now design a job using the Mysql CDC component to incrementally extract the change data from the Leadfact table. To do that:

  1. From the Palette, drop the tMysqlCDC and tLogRow components to the design workspace.

  2. Link the two components using a Row Main link.

  3. Double-click the tMysqlCDC component to define its properties.

  4. Set Property Type to Repository and then select the select the schema corresponding to your Mysql database table, CDC_connection in this scenario. The connection details will display automatically in the corresponding fields

    Note

    If you have not stored the CRM data warehouse connection details in the Metadata entry in the Repository, select Built-in in the property type list and set the connection details manually.

  5. In the Schema using CDC field, select Repository and then select the schema of the Leadfact table stored in the Metadata entry.

  6. In the Table using CDC field, enter the name of the table captured by the CDC, in this scenario Leadfact.

  7. In the Subscriber field, enter the name of the subscriber that will extract modified data, Sub_Mktg, Sub_Sales, and Sub_Finance for the Marketing, Sales and Treasury Departments respectively.

  8. In the Events to catch field, select the check boxes corresponding to the type of the modified data the subscriber will extract. In this scenario, select the three check boxes for the three subscribers.

  9. Double-click the tLogRow component to set is properties.

  10. Click the Sync columns button to retrieve the schema from the preceding component.

  11. Save your job and press F6 to execute it.

The customer names are converted to upper case and the modification type displays here is U to stand for Update.

Once these modifications are extracted, they are no more available in the modified table. To verify their extraction, right-click the Leadfact table catched by the CDC and then select Views All Changes. The extracted changes do not display anymore.