Using regular expressions to match content - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-26

Regular expressions can be used to search for a specific pattern among your data and isolate values that you are interested in.

This scenario takes the example of someone working on a dataset that lists information about books, including their ISBN numbers. Using Talend Data Preparation, it is possible to check if the ISBN are valid, and follow the right pattern. With the Matches Pattern function, you can compare your data with an expression of your choice.

Procedure

  1. Click the ISBN column to select its content.
  2. In the functions list, find and select Matches Pattern....

    A menu opens where you can enter the pattern for your search.

  3. In the Pattern field, select other from the drop-down list.
  4. Click the button on the left side of the Manual pattern field and select RegEx from the list.
  5. In the Manual pattern field, type ^[ISBN]{4}[ ]{0,1}[0-9]{1}[-]{1}[0-9]{3}[-]{1}[0-9]{5}[-]{1}[0-9]{0,1}$.

    This regular expression corresponds to the ISBN number model that you want to identify in your dataset.

  6. Click Submit.

    A new column ISBN_MATCHING is created, where the values that match the pattern defined by the regular expression, are listed a true. The values that do not match are listed as false.

Results

After using a regular expression to search for a specific pattern, you can now easily identify and isolate the values that match your search.