Removing a semantic type - 7.1

Talend Data Stewardship User Guide

Version
7.1
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Stewardship
Content
Administration and Monitoring > Managing users
Data Governance > Assigning tasks
Data Governance > Managing campaigns
Data Governance > Managing data models
Data Quality and Preparation > Handling tasks
Data Quality and Preparation > Managing semantic types

You can delete a semantic type in Talend Dictionary Service to remove it from the list of recognized data types in Talend Data Stewardship.

You can delete both predefined semantic types as well as predefined standard types.

The variety of semantic types that are present by default in Talend Data Stewardship can be troublesome in certain situations. For example, a five-digit number can be interpreted as a American ZIP code, but also as a French or German one since they share the same format.

Let's say that you are working in an American company, and you only have to deal with data coming from American clients, including ZIP codes. You would prefer to keep only the American ZIP code in the list of recognized semantic types.

Using Talend Dictionary Service, you will simply remove the other semantic types that match the five-digit format and only leave US_POSTAL_CODE. The change will then be ported instantly in Talend Data Stewardship, and you will always from now on validate a ZIP code column against the semantic type US_POSTAL_CODE.

Procedure

  1. Open a command prompt window.
  2. Use the cd command, go to the <Dictionary_Service_Path>/command-line folder.
  3. To display the names of the existing semantic types and see which ones to remove, execute the following command: according to your operating system:
    • category_manager.bat -l -type REGEX for Windows.
    • ./category_manager.sh -l -type REGEX for Linux.
    You are prompted for your Talend Administration Center credentials. The command is executed after you enter a valid login and password.

    The list of semantic types based on regular expressions is displayed. You can identify the name of the ones you want to remove, FR_POSTAL_CODE or DE_POSTAL_CODE among others.

  4. To remove the French postal codes semantic type, execute the following command according to your operating system:
    • category_manager.bat -d -name FR_POSTAL_CODE for Windows.
    • ./category_manager.sh -d -name FR_POSTAL_CODE for Linux.
    The FR_POSTAL_CODE has been removed from the list of recognized semantic types and you can not associate five-digit numbers with French ZIP codes anymore when creating data models in Talend Data Stewardship.
  5. Repeat this operation to remove the other semantic types that match five-digit numbers:
    • DE_POSTAL_CODE
    • FR_INSEE_CODE
    When you delete a semantic type which is already used on a column in a data model attached to a campaing, the semantic type of the column is automatically set to text. This means data which could display as invalid with the initial semantic type may look as valid with the text semantic type.

Results

You have deleted all the semantic types compatibles with five-digit numbers but one. From now on, when adding new data models, you can set only US_POSTAL_CODE as the semantic type for columns with Zip code data.

To display a list of all the available commands in Talend Dictionary Service, go to <Dictionary_Service_Path>/command-line and enter the following command according to your operating system:
  • category_manager.bat -h command for Windows.
  • ./category_manager.sh -h for Linux.