Managing User-Defined Functions in databases - 6.2

Talend Data Services Platform Studio User Guide

EnrichVersion
6.2
EnrichProdName
Talend Data Services Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The regular expression function is built in several databases, but many other databases do not support it. The databases that natively support regular expressions include: MySQL, PostgreSQL, Oracle 10g, and Ingres while Microsoft SQL server and Netezza do not, for example.

A different case is when the regular expression function is built in the database but the query template of the regular expression indicator is not defined.

From Profiling perspective of the studio, you can:

For information about using regular expressions with Teradata, check the article How to configure regular expressions on Teradata.

For information about using regular expressions with Netezza, see What you need to know about some databases.

How to declare a User-Defined Function in a specific database

The regular expression function is not built into all different database environments. If you want to use the studio to analyze columns against regular expressions in databases that do not natively support regular expressions, you can:

Either:

  1. Install the relevant regular expressions libraries on the database. For an example of creating a regular expression function on a database, see Regular expressions on SQL Server.

  2. Create a query template for the database in the studio. For more information, see How to define a query template for a specific database.

Or:

  • Execute the column analysis using the Java engine. In this case, the system will use the Java regular expressions to analyze the specified column(s) and not SQL regular expressions. For more information on the Java engine, see Using the Java or the SQL engine.

How to define a query template for a specific database

A query template defines the query logic required to analyze columns against regular expressions. The steps to define a query template in the studio include the following:

  • Create a query template for a specific database,

  • Set the database-specific regular expression if this expression is not simple enough to be used with all databases.

The below example shows how to define a query template specific for the Microsoft SQL Server database. Regular expressions on SQL Server gives a detailed example on how to create a user-defined regular expression function on an SQL server.

Prerequisite(s): You have selected the Profiling perspective in the studio.

To define a query template for a specific database, do the following:

  1. In the DQ Repository tree view, expand Libraries > Indicators.

  2. Expand System Indicators > Pattern Matching.

  3. Double-click Regular Expression Matching, or right-click it and select Open from the contextual menu.

    The corresponding view is displayed to show the indicator metadata and its definition.

    You need now to add to the list of databases the database for which you want to define a query template. This query template will compute the regular expression matching.

  4. Click the [+] button at the bottom of the Indicator Definition view to add a field for the new template.

  5. In the new field, click the arrow and select the database for which you want to define the template. In this example, select Ingres.

  6. Copy the indicator definition of any of the other databases.

  7. Click the Edit... button next to the new field.

    The [Edit expression] dialog box is displayed.

  8. Paste the indicator definition (template) in the Expression box and then modify the text after WHEN in order to adapt the template to the selected database. In this example, replace the text after WHEN with WHEN REGEX.

  9. Click OK. The new template is displayed in the field.

  10. Click the save icon on top of the editor to save your changes.

You have finalized creating the query template specific for the Ingres database. You can now start analyzing the columns in this database against regular expressions.

If the regular expression you want to use to analyze data on this server is simple enough to be used with all databases, you can start your column analyses immediately. If not, you must edit the definition of the regular expression to work with this specific database, Ingres in this example.

If an analysis with a user-defined indicator runs successfully at least one time and later the indicator definition template for the database is deleted, the analysis does not fail. It keeps running successfully because it uses the previously generated SQL query.

For more information on how to set the database-specific regular expression definition, see How to edit a regular expression or an SQL pattern and How to duplicate a regular expression or an SQL pattern.

How to edit a query template

You can edit the query template you create for a specific database.

Prerequisite(s): You have selected the Profiling perspective in the studio.

To edit a query template for a specific database, do the following:

  1. In the DQ Repository tree view, expand Libraries > Indicators.

  2. Expand System Indicator > Pattern Matching.

  3. Double-click Regular Expression Matching, or right-click it and select Open from the contextual menu.

    The corresponding view is displayed to show the indicator metadata and its definition.

  4. Click the button next to the database for which you want to edit the query template.

    The [Edit expression] dialog box is displayed.

  5. In the Expression area, edit the regular expression template as required and then click OK to close the dialog box and proceed to the next step.

    The regular expression template is modified accordingly.

How to delete a query template

You can delete the query template you create for a specific database.

Prerequisite(s): You have selected the Profiling perspective in the studio.

To delete a query template for a specific database, do the following:

  1. In the DQ Repository tree view, expand Libraries > Indicators.

  2. Expand System Indicators > Pattern Matching.

  3. Double-click Regular Expression Matching, or right-click it and select Open from the contextual menu.

    The corresponding view is displayed to show the indicator metadata and its definition.

  4. Click the button next to the database for which you want to delete the query template.

    The selected query template is deleted from the list in the Indicator definition view.