Enriching the semantic types for Talend Dictionary Service (command line) - 7.1

Talend Data Stewardship User Guide

English (United States)
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Data Stewardship
Administration and Monitoring > Managing users
Data Governance > Assigning tasks
Data Governance > Managing campaigns
Data Governance > Managing data models
Data Quality and Preparation > Handling tasks
Data Quality and Preparation > Managing semantic types

Talend Dictionary Service stores the semantic categories used in various Talend products including Talend Data Stewardship. You can enrich these semantic types with your personal categories, and all the changes you make are instantly available in Talend Data Stewardship. However, the availability of Talend Dictionary Service depends on the license you have.

Talend Data Stewardship has data model awareness which makes possible the syntactic and semantic validation of data. You can define the attributes in the data model and select their types out of the predefined standard or semantic types stored on Talend Dictionary Service.

When campaign owners define the structure of the data to be managed in a campaign, they can select from a predefined list the semantic type for each attribute. Then when they load data into Talend Data Stewardship, an internal validation of the schema type is performed and data is displayed as valid or invalid accordingly.

For example, the list of entries included by default in Talend Data Stewardship under countries does not include Republic of Angola, United States of America and UK. As a result, such entries are considered invalid country names when loaded to Talend Data Stewardship.

But, you can go further and create your own semantic types, as well as updating or deleting the existing ones, so that your experience with Talend Data Stewardship speaks your business language. You can do all these management options either through a user interface integrated in Talend Data Stewardship or through the command line interface.

On the server, the semantic types are divided into several categories:
  • The Dictionary type, based on a closed list of values.
  • The Regular expression type that compares your data against a preselected regular expression.
  • The Compound type that compares your data against several semantic types referenced in the compound type.

To display a list of all the available commands in Talend Dictionary Service, go to <Dictionary_Service_Path>/command-line and enter the following command according to your operating system:

  • category_manager.bat -h command for Windows.
  • ./category_manager.sh -h for Linux.
To enable the interaction between Talend Dictionary Service and Talend Data Stewardship, you must fulfill the following prerequisites:
  • Talend Dictionary Service is installed and running.
  • Talend Administration Center is installed and running.
  • You have a Platform license.
  • The role assigned to you in Talend Administration Center is either Designer or Operation manager.
  • The Dictionary Service User and Data Stewardship User check boxes are selected for your user account in the administration center and you have any of the two possible roles set in the Data Stewardship Role field.
  • In the <install_folder>\tds\apache-tomcat\conf\data-stewardship.properties file, the dataquality.dictionaryservice.enable property is set as true.