Creating a dictionary-based semantic type using a large source file

Talend Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
6.3
2.0
EnrichProdName
Talend Data Fabric
Talend Real-Time Big Data Platform
Talend Big Data Platform
Talend Big Data
Talend MDM Platform
Talend Data Integration
Talend Data Services Platform
Talend Data Management Platform
Talend ESB
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation
When you create a dictionary-based semantic type in Talend Dictionary Service using a source list containing more than 1,000 values, you must split the source list into smaller files.

Procedure

  1. Split the source list into smaller .txt files:
    For example, file1.txt and file2.txt.
  2. Add the files to the <Dictionary_Service_Path>/command-line/samples/source folder.

    This folder is used for the sake of this example, but you can save the files to your preferred location.

  3. Open a command prompt window.
  4. Using the cd command, go to the <Dictionary_Service_Path>/command-line folder.
  5. To create the semantic type, execute the following command according to your operating system:
    • category_manager.bat -c -name <SemanticTypeName> -type DICT -cmpl true -desc "<Description>" -src samples/source/file1.txt for Windows.

    • ./category_manager.sh -c -name <SemanticTypeName> -type DICT -cmpl true -desc "<Description>" -src samples/source/file1.txt for Linux.

    You are prompted for your Talend Administration Center credentials. The command is executed after you enter a valid login and password.

  6. To update the created semantic type and add the list of values from the file2.txt file, execute the following command according to your operating system:
    • category_manager.bat -a -name <SemanticTypeName> -type DICT -cmpl true -desc "<Description>" -src samples/source/file2.txt for Windows.

    • ./category_manager.sh -a -name <SemanticTypeName> -type DICT -cmpl true -desc "<Description>" -src samples/source/file2.txt for Linux.

  7. To display the list of entries under the created semantic type, execute the following command according to your operating system:
    • category_manager.bat -e -name <SemanticTypeName> for Windows.

    • ./category_manager.sh -e -name <SemanticTypeName> for Linux.

    You can see that the values from both file1.txt and file2.txt files have been added to the created semantic type.