Enriching the semantic types for Data Stewardship - Cloud

Talend Cloud Data Stewardship User Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Stewardship
Content
Administration and Monitoring > Managing users
Data Governance > Assigning tasks
Data Governance > Managing campaigns
Data Governance > Managing data models
Data Quality and Preparation > Handling tasks
Last publication date
2024-01-30
Talend Dictionary Service stores the semantic categories used in various Talend products including Talend Cloud Data Stewardship. You can enrich these semantic types with your personal categories, and all the changes you make are instantly available. However, the availability of Talend Dictionary Service depends on the license you have.
Note: You can upload up to 10 MB of content to Talend Dictionary Service per tenant.
To enable the interaction between Talend Dictionary Service and Talend Cloud Data Stewardship, you must fulfill the following prerequisites:
  • You have a Platform license.
  • Your Talend Cloud user must have the Semantic types manager role of the Dictionary service application assigned in Talend Cloud Data Stewardship, in addition to any of the Talend Cloud Data Stewardship roles.
Note: If you are using a trial version of Talend Cloud Data Stewardship, semantic types management will not be available.

When campaign owners define the structure of the data to be managed in a campaign, they can select from a predefined list the semantic type for each attribute. Then when they load data into Talend Cloud Data Stewardship, an internal validation of the schema type is performed and data is displayed as valid or invalid accordingly.

Valid and invalid data in a campaign.

For example, the list of entries included by default in the application under countries does not include Republic of Angola and UK. As a result, such entries are considered invalid country names when loaded to Talend Cloud Data Stewardship.

But, you can go further and create your own semantic types, as well as updating or deleting the existing ones, so that your experience with Talend Cloud Data Stewardship speaks your business language. You can do all these management options through an integrated user interface.

When you create semantic types, you can decide either to use them for data validation or to use them for data discovery:
  • Data validation matches data against semantic types and marks data as valid or not valid.
  • Data discovery explores the semantic categories and query complex semantic relationships in the data you analyze and outputs the matching results to show the most relevant concepts.

Talend Cloud Data Stewardship uses the semantic types only for validation as no data discovery is done its side.

On the server, the semantic types are divided into several categories:
  • The Dictionary type which is based on a closed list of values.
  • The Regular expression type that compares your data against a preselected regular expression.
  • The Compound type that compares your data against several semantic types referenced in the compound type.