Enriching the semantic types libraries through command line interface - 6.5

Talend Data Preparation User Guide

Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Data Preparation
Data Quality and Preparation > Cleansing data

When you add a dataset, Talend Data Preparation automatically suggests one of the supported semantic types for each column. If the semantic type proposed by Talend Data Preparation for one column is not the desired one, you can manually change it by clicking the white arrow in the column header.

This allows you to choose among the list of semantic types present in Talend Data Preparation by default. See Predefined Semantic Types for more information. You can go further by creating your own semantic types, as well as updating or deleting the existing ones, so that Talend Data Preparation speaks your business language.

The semantic types modifications are made using Talend Dictionary Service. This tool stores all the semantic libraries used in various Talend products, including Talend Data Preparation. All the changes that you make in the Talend Dictionary Service server will be instantly available in Talend Data Preparation. The availability of Talend Dictionary Service depends on the license you have.

In Talend Dictionary Service, the semantic types are divided into three main categories:
  • The DICT type, based on an open or closed list of values.
  • The REGEX type that compares your data against a preselected regular expression.
  • The COMPOUND type, under which you can group several existing types

To display a list of all the available commands in Talend Dictionary Service, go to <Dictionary_Service_Path>/command-line and enter the following command according to your operating system:

  • category_manager.bat -h command for Windows.
  • ./category_manager.sh -h for Linux.

To enable the interaction between Talend Dictionary Service and Talend Data Preparation, you must fulfill the following prerequisites:

  • Talend Dictionary Service is installed and running.
  • Talend Administration Center is installed and running.
  • Your Talend Administration Center user type is either Master Data Management or Data Quality
  • The Data Preparation User check box is selected for your user in Talend Administration Center with any of the three possible roles set in the Data Preparation Role field.
  • In the <install_folder>\dataprep\config\application.properties file, the dataquality.semantic.update.enable property is set as true.