Adding a new compound semantic type - 7.1

Talend Data Stewardship User Guide

Version
7.1
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Stewardship
Content
Administration and Monitoring > Managing users
Data Governance > Assigning tasks
Data Governance > Managing campaigns
Data Governance > Managing data models
Data Quality and Preparation > Handling tasks
Data Quality and Preparation > Managing semantic types

You can create a compound semantic type which references other semantic types that are published on Talend Dictionary Service and add it to the list of recognized data types in the data models in Talend Data Stewardship.

You can mix all semantic types when creating a compound type, and a compound semantic type can reference other compound types on the condition that all children types are already published.

Let's say that you have a file which holds information about customers from US, UK, Germany and France. You need to intervene and validate the different zip codes against a compound semantic type you create. Once data matches one of the child types, it is considered as valid and it is not evaluated against the other referenced types.

When defining the data model in Talend Data Stewardship, you can set the semantic type for the column containing the zip codes to this new compound type, Zip_codes in this example.

Before you begin

All the children semantic types you want to use in the compound type are created and published.

Procedure

  1. In the homepage, click SEMANTIC TYPES > ADD SEMANTIC TYPE.
  2. Enter a name and a description for the new semantic type.
  3. Select the semantic type from the Type list.
  4. Keep the Use for validation switch activated.

    This compound type will be used to define which values are considered right or wrong when applied on a given column. The result of this validation process can be seen in the quality bar of each column in your datasets.

    In this example, if you were to deactivate the switch, the compound type would only be used for data discovery, and no value would be considered invalid.

  5. From the Children types list, select the semantic types you want to group in this compound type.
  6. Click SAVE AND PUBLISH to send the semantic type to the Talend Dictionary Service server and make it available to be used by the system.
    Clicking SAVE AS DRAFT stores the new type on the server without propagating it to the system. The new type is not usable unless it is published. For a use case of this option, let's say that you have new semantic types to deploy as part of a new project. You can prepare the work by creating the semantic types and save them as draft before the go-live of the project, and can deploy the semantic types only the day of go-live.
  7. Go back to Talend Data Stewardship and create the data model for the customers data.
    The new semantic category Phone_numbers is available now in the list of semantic types and you can set it for the column containing the phone numbers.

Results

When you load the customer data to Talend Data Stewardship, data is matched and validated against the Phone_numbers compound type you created. Data is evaluated against the first child type and if data matches it is not evaluated against the other referenced types and so on.