Regular expressions can be used to search for a specific pattern among your data
and isolate values that you are interested in.
This scenario takes the example of someone working on a dataset that lists information
about books, including their ISBN numbers. Using Talend Data Preparation, it is possible to check
if the ISBN are valid, and follow the right pattern. With the Matches
Pattern function, you can compare your
data with an expression of your choice.
Procedure
-
Click the ISBN column to select its content.
-
In the functions list, find and select Matches
Pattern....
A menu opens where you can enter the pattern for your search.
-
In the Pattern field, select
other from the drop-down list.
-
Click the button on the left side of the Manual pattern
field and select RegEx from the list.
-
In the Manual pattern field, type ^[ISBN]{4}[
]{0,1}[0-9]{1}[-]{1}[0-9]{3}[-]{1}[0-9]{5}[-]{1}[0-9]{0,1}$.
This regular expression corresponds to the ISBN number model that you want to
identify in your dataset.
-
Click Submit.
A new column ISBN_MATCHING is created, where the
values that match the pattern defined by the regular expression, are listed
a true. The values that do not match are listed as
false.
Results
After using a regular expression to search for a specific pattern, you can now easily
identify and isolate the values that match your search.