Updating an existing semantic type through command line interface - 2.3

Talend Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
6.5
2.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

You can edit an existing semantic type in Talend Dictionary Service to impact how your data is validated in Talend Data Preparation.

Predefined semantic types in Talend Data Preparation are based on standard values, but you may need to tailor them to match your own data. Some data that you would expect to fall under a predefined category, may be considered invalid.

Let's take the example of a dataset containing a list of customers, with their email addresses, date of birth, and the country they live in. You can notice that all the entries for United States of America are considered invalid, when they should not since it is the official name of the country.

The problem here is that United States of America is not one of the expected value for the country semantic type in Talend Data Preparation. The valid entry in this case would be United States.

To avoid having this problem in the future, you will update the country semantic type in Talend Dictionary Service, and add United States of America to the list of valid entries. The change will be automatically available in Talend Data Preparation.

Procedure

  1. Open a command prompt window
  2. Using the cd command, go to the <Dictionary_Service_Path>/command-line folder.
  3. To add the value United States of America to the list of valid countries, execute the following command according to your operating system:
    • category_manager.bat -a -name COUNTRY -value "United States of America" for Windows.
    • ./category_manager.sh -a -name COUNTRY -value "United States of America" for Linux.

    Please note that to be able to use this command, you need to put it on one single line.

    You are prompted for your Talend Administration Center credentials. The command is executed after you enter a valid login and password.

  4. To display the list of entries under the country semantic type, execute the following command according to your operating system:
    • category_manager.bat -e -name COUNTRY for Windows.
    • ./category_manager.sh -e -name COUNTRY for Linux.

    You can see that United States of America has been properly added at the bottom of the list of valid entries for the country semantic type.

  5. Go back to Talend Data Preparation and open your dataset with the column containing the customers countries.

    The change in semantic types is instantly available in Talend Data Preparation, but you need to manually refresh the column to make it visible in your existing datasets and preparations.

  6. To make the change in the countries list active, you can either:
    • import your dataset again.
    • make a copy of the column which semantic type you want to update, COUNTRY in this example.

    You can see in the quality bar under the column header that there is no invalid values anymore.

Results

The country semantic type has been manually updated to support a new value.

From now on, when dealing with data that are matched with the country semantic type, United States of America will be considered a valid value.

To display a list of all the available commands in Talend Dictionary Service, go to <Dictionary_Service_Path>/command-line and enter the following command according to your operating system:

  • category_manager.bat -h command for Windows.
  • ./category_manager.sh -h for Linux.