Scenario 2: Checking customer table against a given DQ rule to select customer records - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario is a three-component Job that checks a MySQL customer table against a given data quality rule so as to query specific customer records.

These components used in the Job are:

  • tMysqlConnection: this component creates the connection to the MySQL database of interest so that the other components can reuse it.

  • tMySQLInvalidRows: this component reuses the connection created by tMysqlConnection and query the customer records of interest against the given data quality rule.

  • tLogRow: this component presents the result of running this Job.

To replicate this scenario, proceed as illustrated in the following sections:

Dropping and linking the components

  1. From the Palette, drop tMysqlConnection, tMySQLInvalidRows and tLogRow onto the design workspace.

    You can label any component in the workspace the way you need. For further information about how to label a component, see Talend Studio User Guide.

  2. Right-click tMysqlConnection and select the Trigger > OnSubjobOk link to connect this component to tMySQLInvalidRows.

  3. Repeat this operation but select the Row > Main link to connect tMySQLInvalidRows to tLogRow.

Configuring the connection to the database

To configure the connection to the MySQL database of interest, proceed as follows:

  1. Double-click tMysqlConnection to open its Component view.

  2. In the DB version field, select the version of your MySQL database. It is Mysql 5 in this example.

  3. In the Host field, type in the server IP address of the database to which you want to connect, localhost in this example.

  4. In the Port field, type in the port number of the database to which you want to connect, 3306 in this example.

  5. In the Database field, enter the name of the database in which you need to check the customer table, crm in this example.

  6. In the Username and the Password fields, enter the authentication information to connect to the database of interest.

Configuring the query of the customer records of interest

  1. Double-click tMySQLInvalidRows to open its Component view.

  2. In the Validation type field, select DQ rule validation in order to use the given data quality rule.

  3. Select Use an existing connection to reuse the connection that tMysqlConnection creates.

  4. Click the [...] button next to Edit schema to open the schema editor.

  5. Click the [+] button three times to add throw rows and rename them as Name, DOB and Email, respectively.

  6. In the DOB row, select VARCHAR in the DB type column, enter 19 in the Length column and 0 in the Precision column.

  7. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.

  8. In the Table name field, enter or browse to the database table to be checked, cust in this example.

  9. In the Where clause field, type in the where clause to be used, in addition to the given data quality rule, to query the customer data of interest. In this scenario, enter `cust`.`Email` like 's%' within the quotation marks to retrieve the email records beginning with the letter s or S.

  10. In the DQ rule list field, select the data quality rule to be used. In this scenario, select cust_age. This rule is a demo rule reading

    18<=((TO_DAYS(NOW())-TO_DAYS(DOB))/365)

    You can check the available data quality rules in Libraries > Rules > SQL of the DQ Repository. The cust_age rule can be imported from the TDQEEDEMOJAVA project.

    For further information about the DQ Repository and its items and about how to import the data quality demo project, see the Talend Studio User Guide

  11. Click Guess Query to generate the corresponding query clause that uses the Not clause.

  12. In the Query field, change 18 to 20 to modify the clause as the following:

    "SELECT `Name`, `DOB`, `Email` FROM `crm`.`cust` 
      WHERE (NOT ( 20>((TO_DAYS(NOW())-TO_DAYS(DOB))/365) )
      AND `cust`.`Email` like 's%')"

    This clause allows you to select the customers whose ages count 20 or more.

Executing the Job

The tLogRow component presents the execution result of the Job. You can configure the presentation mode on its Component view.

To do this, double-click tLogRow to open the Component view and in the Mode area, select the Table (print values in cells of a table) option.

  • To execute this Job, press F6.

Once done, the Run view is opened automatically, where you can check the execution result.

You can read that the customer records beginning with the letter S are retrieved and their ages are all more than 20.