Regular expressions can be used to search for a specific pattern among your data
and isolate values that you are interested in.
This scenario takes the example of someone working on a dataset that lists information
about books, including their ISBN numbers. Using Talend Data Preparation, it is possible to check if the ISBN are valid, and follow the right
pattern. With the Match pattern function, you can
compare your data with an expression of your choice.
Procedure
-
Click the ISBN column to select its content.
-
In the functions list, find and select Match
pattern....
A menu opens where you can enter the pattern for your
search.
-
In the Pattern field, select
other from the drop-down list.
-
Click the button on the left side of the Manual
pattern field and select
Regex from the list.
-
In the Manual pattern field, type ^[ISBN]{4}[
]{0,1}[0-9]{1}[-]{1}[0-9]{3}[-]{1}[0-9]{5}[-]{1}[0-9]{0,1}$.
This regular expression corresponds to the ISBN number model that you want to
identify in your dataset.
-
Click Submit.
A new column ISBN_matching is created, where the
values that match the pattern defined by the regular
expression, are listed a
true. The values that do not
match are listed as
false.
Results
After using a regular expression to search for a specific pattern, you can now easily
identify and isolate the values that match your search.