You can delete a semantic type in Talend Dictionary Service to remove it from the list of recognized data types in Talend Data Preparation.
This applies to both predefined semantic types, as well as custom semantic types.
The variety of semantic types that are present by default in Talend Data Preparation may not apply to your business context. For example, a five-digit number can be interpreted as a American ZIP code, but also as a French or German one since they share the same format.
Talend Data Preparation tends to automatically match five-digit number with French ZIP codes. Let's say that you are working in an American company, and you only have to deal with data coming from American clients, including ZIP codes. Always having the wrong semantic type in your columns containing ZIP codes can quickly become annoying.
In this example, the ZIP column of the dataset you are preparing can be matched with at least four types.
Using Talend Dictionary Service, you will simply remove the other semantic types that match the five-digit format and
US_POSTAL_CODE. The change will then be ported instantly in
Talend Data Preparation, and
five-digit numbers will automatically be identified as US ZIP codes from now on.
- Open a command prompt window.
- Using the
cdcommand, go to the <Dictionary_Service_Path>/command-line folder.
- To display the names of the existing semantic types and see which ones to
remove, execute the folllowing command: according to your operating
category_manager.bat -l -type REGEXfor Windows.
./category_manager.sh -l -type REGEXfor Linux.
You are prompted for your Talend Administration Center credentials. The command is executed after you enter a valid login and password.
The list of semantic types based on regular expressions is displayed. You can identify the name of the ones you want to remove,
- To remove the French postal codes semantic type, execute the following command
according to your operating system:
category_manager.bat -d -name FR_POSTAL_CODEfor Windows.
./category_manager.sh -d -name FR_POSTAL_CODEfor Linux.
FR_POSTAL_CODEhas been removed from the list of recognized semantic types and five-digit numbers will not be associated with French ZIP codes anymore.
- Repeat this operation to remove the other semantic types that match five-digit
- Go back to your preparation with the column containing ZIP codes in Talend Data Preparation.
The change in semantic types is instantly available. Because you deleted the semantic type that was used until now, the ZIP column is automatically defined as
- To set the proper semantic type to the column, click the white arrow in the column header.
- Point your mouse over This column is a text and select
US Postal Code.
This time, the data from the Zip can only be matched with the
You have deleted all the semantic types compatibles with five-digit numbers but one. From now on, when adding new datasets, this type of data will be identified as US postal codes.
To display a list of all the available commands in Talend Dictionary Service, enter the
category_manager.bat -h command for Windows or
./category_manager.sh -h for Linux