You can edit an existing semantic type in Talend Dictionary Service to impact how you data is validated in the sample view of the application.Predefined semantic types are based on standard values, but you may need to tailor them to match your own data. Some data that you would expect to fall under a predefined category, may be considered invalid.
Let's take the example of a dataset containing a list of customers, with their email addresses, date of birth, and the country they live in. You can notice that all the entries for America are considered invalid. While it is indeed not a valid country name, it is the value that your company is using and you would like to make it a valid value.
The problem here is
that America is not one of the expected value for
country semantic type in Talend Dictionary Service. The valid entry in this case
would be United States or United States of America.
having this problem in the future, you will update the
country semantic type in Talend Dictionary Service, and add America to the
list of valid entries. The change will be automatically available in Talend Cloud Data Inventory and the other cloud
- From the left panel of the homepage, open the Semantic type view.
- From the list of existing semantic types, click the Country semantic type to open it.In this window, all the parameters of the semantic type can be modified, including the list of entries used to discover or validate data.
- In the values list, point your mouse over the United States entry and click the pen icon that is displayed on the right.
- Right after United States, enter
Americaas a new value, separated by a comma.
- Click the check icon to validate your change.All the comma-separated values that are in the same row are set a synonyms. As a consequence, America will now also be considered a valid value for the
- Click Save and publish to propagate the change in Talend Dictionary Service and make it available for all users.The change in semantic type is instantly available in Talend Cloud Data Inventory for every new dataset that you create. For existing datasets, you will need to refresh the sample in order to recalculate the quality with the new value.
- Go back to your dataset with the column containing the customers countries.
- Click the Refresh sample button.
countrysemantic type has been manually updated to support a new value, and you can see that the quality bar under the column header shows that there is no invalid values anymore.
From now on, when dealing with data that are matched with the
country semantic type, America will be considered a valid value.