About this task
The function allows you to select up to five different semantic types that correspond to the type of information you want to extract from a given field. It works with semantic types based on regular expressions or dictionaries, as well as compound semantic types.
For this example, imagine that you are working for the Ministry of Culture, and you need to prepare data based on a survey issued to museum visitors. This survey was able for example to gather some basic demographic information on the visitors, such as their age or gender, but also some comments, that they could enter in a specific field. This comments field could be used by the visitors to share their experience, leave other contact information, or even recommend other museums from other countries they might have visited. This information could be used to build future partnerships for example.
However, after a simple parsing operation, the various information that were gathered in the comments field all ended up in a single field in the resulting dataset. You on the other hand, would like to extract the different types on information to sort them into specific columns. To accomplish that, you will make use of the Extract values by semantic type function, as well as the predefined or custom semantic types available with Talend Cloud Data Preparation, to identify the different categories of information left in the comments, and extract them to individual columns.
- Click the header of the Comments column to select its content.
In the functions panel, type Extract
values by semantic type and click the result to open the options
for the associated function.
In the first Semantic
type drop-down list select Museum.
All the semantic types that are available in the drop-down list correspond to either the predefined semantic types, or the custom ones you created using Talend Dictionary Service. Each category will be extracted to a new column.
In the second and third Semantic
type drop-down lists, select Country and Email
Those three categories correspond to the type of information that you hope museum visitors left in the comments field.
- Select the Normalize value check box to apply a standardization process to the extracted values based on the default or custom dictionary-based and compound semantic types.
- Click Submit.