Managing regular expressions and SQL patterns - 6.1

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The management procedures of regular expressions and SQL patterns include operations like creating, testing, duplicating, importing and exporting.

The sections below explain in detail each of the management option for regular expressions and SQL patterns. Management processes for both types of patterns are exactly the same.

How to create a new regular expression or SQL pattern

You can create new regular expressions or SQL patterns, including those for Java to be used in column analyses.

Note

Management processes for regular expressions and SQL patterns are the same. The procedure below with all the included screen captures reflect the steps to create a regular expression. You can follow the same steps to create an SQL pattern.

Prerequisite(s): You have selected the Profiling perspective of the studio.

To create a new pattern, do the following:

  1. In the DQ Repository tree view, expand Libraries > Patterns, and then right-click Regex.

  2. From the contextual menu, select New Regex Pattern to open the corresponding wizard.

    When you open the wizard, a help panel automatically opens with the wizard. This help panel guides you through the steps of creating new regular patterns.

  3. In the Name field, enter a name for this new regular expression.

    Note

    Avoid using special characters in the item names including:

    "~", "!", "`", "#", "^", "&", "*", "\\", "/", "?", ":", ";", "\"", ".", "(", ")", "'", "¥", "'", """, "«", "»", "<", ">".

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  4. If required, set other metadata (purpose, description and author name) in the corresponding fields and click Next.

  5. In the Regular expression field, enter the definition of the regular expression to be created. The regular expression must be surrounded by single quotes.

    Note

    For the PostgreSQL database, regular expressions are not compatible among the database different versions.

    If you want to use the regular expression with PostgreSQL version 9.1 or greater, you must either:

    • in the PostgreSQL database configuration, set the standard_conforming_strings parameter to off and write double backslashes in the definition, or

    • in the Regular expression field in the wizard, use a single backslash in the expression definition.

    For further information about PostgreSQL regular expressions, select Window > Show View, expand Help and then select Bookmarks.

  6. From the Language Selection list, select the language (a specific database or Java).

    Warning

    If you select a database from the Language Selection list, you will be able to generate ELT Jobs on the column analysis results to recuperate valid and invalid rows.

    If you select the Java or the Default language, you will be able to generate an ETL Job on the column analysis results.

    For further information on how to generate a Job to recuperate valid, invalid or both types of rows, see Recuperating valid and invalid rows in a column analysis .

  7. Click Finish to close the dialog box.

    A subfolder for this new regular expression is listed under the Regex folder in the DQ Repository tree view, and the pattern editor opens with the defined metadata and the defined regular expression.

  8. In the Pattern Definition view, click the [+] button and add as many regular expressions as necessary in the new pattern.

    You can define the regular expressions specific to any of the available databases or specific to Java.

    Note

    If the regular expression is simple enough to be used in all databases, select Default from the list.

    Subfolders labeled with the specified database types or Java are listed below the name of the new pattern under the Patterns folder in the DQ Repository tree view.

  9. Save the new pattern.

    Once the pattern is created, you can drop it directly onto a database column in the open analysis editor.

  10. If required, click the pattern name to display its detail in the Detail View and Technical view in the Studio.

    Note

    In the pattern editor, you can click Test next to the regular expression to test the regular pattern definition. For more information, see How to test a regular expression in the Pattern Test View. Also, from the [Pattern Test View], you can create a new pattern based on the regular expression you are testing. For further information, see How to create a new pattern from the Pattern Test View .

How to test a regular expression in the Pattern Test View

It is possible to test character sequences against a predefined or newly created regular expression.

Prerequisite(s): At least one database connection is set in the Profiling perspective of the studio.

To test a character sequence against a regular expression, do the following:

  1. Follow the steps outlined in How to create a new regular expression or SQL pattern to create a new regular expression.

  2. In the open pattern editor, click Pattern Definition to open the relevant view.

  3. Click the Test button next to the definition against which you want to test a character sequence to proceed to the next step.

    The test view is displayed in the Studio showing the selected regular expression.

  4. In the Test Area, enter the character sequence you want to check against the regular expression

  5. From the DB Connection list, select the database in which you want to use the regular expression.

    Note

    If you select to test a regular expression in Java, the Java option will be selected by default and the DB Connections option and list will be unavailable in the test view.

  6. Click Test.

    An icon is displayed in the upper left corner of the view to indicate if the character sequence matches or does not match the selected pattern definition.

  7. If required, modify the regular expression according to your needs and then click Save to save your modifications.

    The pattern definition is modified accordingly in the pattern editor.

Note

You can create/modify patterns directly from the Pattern Test View via the Create Pattern button. For further information, see How to create a new pattern from the Pattern Test View

How to create a new pattern from the Pattern Test View

You can create your own customized patterns from the [Pattern Test View]. The advantage of creating a pattern from this view is that you can create your customized pattern based on an already tested regular expression. All you need to do is to customize the expression definition according to your needs before saving the new pattern.

Prerequisite(s): You have selected the Profiling perspective in the studio.

To create a new pattern based on a predefined or a newly created regular expression, do the following:

  1. In the DQ Repository tree view, expand Libraries > Patterns > Regex and double-click the pattern you want to use to create your customized pattern.

    The pattern editor opens in the studio.

  2. Click Test next to the definition you want to use as a base to create the new pattern.

    The [Pattern Test View] is opened on the definition of the selected regular expression.

  3. If required, test the regular expression through entering text in the Test Area. For further information, see How to test a regular expression in the Pattern Test View.

  4. Click Create Pattern to open the [New Regex pattern] wizard.

  5. In the Name field, enter a name for this new regular expression.

    Note

    Avoid using special characters in the item names including:

    "~", "!", "`", "#", "^", "&", "*", "\\", "/", "?", ":", ";", "\"", ".", "(", ")", "'", "¥", "'", """, "«", "»", "<", ">".

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  6. If required, set other metadata (purpose, description and author name) in the corresponding fields and click Next to proceed to the next step.

    The definition of the initial regular expression is already listed in the Regular expression field.

  7. Customize the syntax of the initial regular expression according to your needs. The regular expression definition must be surrounded by single quotes.

    Note

    For the PostgreSQL database, regular expressions are not compatible among the database different versions.

    If you want to use the regular expression with PostgreSQL version 9.1 or greater, you must either:

    • in the PostgreSQL database configuration, set the standard_conforming_strings parameter to off and write double backslashes in the definition, or

    • in the Regular expression field in the wizard, use a single backslash in the expression definition.

    For further information about PostgreSQL regular expressions, select Window > Show View, expand Help and then select Bookmarks.

  8. From the Language Selection list, select the database in which you want to use the new regular expression.

  9. Click Finish to close the wizard.

    A subfolder for the new pattern is listed under the Regex folder in the same file of the initial regular pattern. The pattern editor opens on the pattern metadata and pattern definition.

How to generate a regular expression from the Date Pattern Frequency Table

You can generate a regular pattern from the results of an analysis that uses the Date Pattern Frequency Table indicator on a date column.

Prerequisite(s): In the Profiling perspective of the studio, a column analysis is created on a date column using the Date Pattern Frequency Table indicator.

Warning

To be able to use the Date Pattern Frequency Table indicator on date columns, you must set the execution engine to Java in the Analysis Parameter view of the column analysis editor. For more information on execution engines, see Using the Java or the SQL engine.

For more information on how to create a column analysis, see Creating a basic analysis on a database column.

To generate a regular expression from the results of a column analysis, do the following:

  1. In the DQ Repository tree view, right-click the column analysis that uses the date indicator on a date column.

  2. Select Open from the contextual menu to open the corresponding analysis editor.

  3. Press F6 to execute the analysis and display the analysis results in the Graphics panel to the right of the Studio.

  4. At the bottom of the editor, click the Analysis Results tab to display a more detailed result view.

    In this example, 100.00% of the date values follow the pattern yyyy MM dd and 39.41% follow the pattern yyyy dd MM.

  5. Right-click the date value for which you want to generate a regular expression and select Generate Regex Pattern from the contextual menu.

    The [New Regex Pattern] dialog box is displayed.

  6. Click Next.

    The date regular expression is already defined in the corresponding field.

  7. Click Finish to proceed to the next step.

    The pattern editor opens with the defined metadata and the generated pattern definition.

    The new regular expression is listed under Pattern > Regex in the DQ Repository tree view. You can drag it onto any date column in the analysis editor.

  8. If required, click the Test button to test a character sequence against this date regular expression as outlined in How to test a regular expression in the Pattern Test View.

How to edit a regular expression or an SQL pattern

You can open the editor of any regular expression or SQL pattern to check its settings and/or edit its definition in order to:

  • adapt it to a specific database type, or

  • adapt it to a specific use.

Prerequisite(s): You have selected the Profiling perspective in the studio.

To open/edit a regular expression or an SQL pattern, do the following:

  1. In the DQ Repository tree view, expand Libraries > Patterns.

  2. Browse through the regular expression or SQL pattern lists to reach the expression or pattern you want to open/edit.

  3. Right-click its name and select Open from the contextual menu.

    The pattern editor opens displaying the regular expression or SQL pattern settings.