Updating an existing semantic type through the user interface - Cloud

Talend Cloud Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

You can edit an existing semantic type in Talend Dictionary Service to impact how your data is validated in Talend Data Preparation.

Predefined semantic types in Talend Data Preparation are based on standard values, but you may need to tailor them to match your own data. Some data that you would expect to fall under a predefined category, may be considered invalid.

Let's take the example of a dataset containing a list of customers, with their email addresses, date of birth, and the country they live in. You can notice that all the entries for United States of America are considered invalid, when they should not since it is the official name of the country.

The problem here is that United States of America is not one of the expected value for the country semantic type in Talend Data Preparation. The valid entry in this case would be United States.

To avoid having this problem in the future, you will update the country semantic type in Talend Dictionary Service, and add United States of America to the list of valid entries. The change will be automatically available in Talend Data Preparation.

Procedure

  1. Open the Semantic types view from the left panel of the Talend Data Preparation homepage.
  2. From the list of existing semantic types, click the Country type to open it.
    In this window, all the parameters of the semantic type can be modified, including the list of entries used to discover or validate data.
  3. In the Values list, point your mouse over the United States entry and click the pen icon that is displayed on the right.
  4. Right after United States, enter United States of America as second value, separated by a comma.
  5. Click the tick icon to validate your change.

    Those two values, that were entered in the same row, are now set as synonyms. As a consequence, United States of America will now be considered a valid value for the country semantic type.

  6. Click Save and publish to propagate the change in Talend Dictionary Service and make it available to the Talend Data Preparation users.

    The change in semantic types is instantly effective in Talend Data Preparation for every new dataset that you import. For existing datasets, you need to duplicate the column or reimport your dataset.

  7. Go back to your dataset with the column containing the customers countries.
  8. Duplicate the column with the updated semantic type applied, Country in this case.

    You can see in the quality bar under the column header that there is no invalid values anymore.

Results

The country semantic type has been manually updated to support a new value.

From now on, when dealing with data that are matched with the country semantic type, United States of America will be considered a valid value.