Adding a new compound semantic type - Cloud

Talend Cloud Data Inventory User Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Inventory
Content
Administration and Monitoring > Managing connections
Data Governance
Data Quality and Preparation > Enriching data
Data Quality and Preparation > Identifying data
Data Quality and Preparation > Managing datasets
Last publication date
2024-02-28

You can create a compound semantic to group other semantic types that are published on the Talend Dictionary Service server and add it to the list of recognized data types.

You can mix all semantic types when creating a compound type, and a compound semantic type can reference other compound types on the condition that all children types are already published.

In this example you need to prepare a file containing information about customers from the United States, the United Kingdom, Germany, and France. One of the columns in this dataset contains postal codes from these different countries, and as a consequence, with different formats. In this case, the application will apply the semantic type that matches the most with the values in the column, US Postal code for example. This will cause the rest of the data, German, French, and British postal codes, to be considered invalid.

To make the application more adapted to this situation, you will create a compound type, regrouping the several semantic types used to validate postal codes.

Before you begin

All the semantic types that you want to group under the compound type have been published.

Procedure

  1. From the left panel of the homepage, open the Semantic type view.
  2. Click the Add semantic type button.
  3. In the Name field, enter Postal code.
  4. In the Description field, enter American, British, German and French postal codes.
  5. In the Type drop-down list, select Compound type.
  6. Keep the Use for validation switch activated.

    This compound type will be used to define which values are considered right or wrong when applied on a given column. The result of this validation process can be seen in the quality bar of each column in your datasets.

    In this example, if you were to deactivate the switch, the compound type would only be used for data discovery, and no value would be considered invalid.

  7. From the Children types drop-down list, select the semantic types you want to group under this Postal code compound type
    Selection of semantic types in the new compound type.
  8. Click Save and publish to send the new compound type to the Talend Dictionary Service server and make it available to the Talend Cloud Data Inventory users.

    Clicking Save as draft means that the semantic type will be stored in Talend Dictionary Service, but will not be broadcast to the Talend Cloud applications. This allows you to chose the moment when you want to make your semantic types public.

    The Postal code type is now available in the list of semantic types with the status set as Published.

    The change in semantic types is instantly effective in Talend Cloud Data Inventory for every new dataset that you create. For existing datasets, you will need to refresh the sample in order to recalculate the quality with the new category that is more suited.

  9. Go back to your dataset containing the postal codes from several countries.
  10. Click the Refresh sample button.
    Location of the Refresh button in the dataset overview.

Results

Your data is now matched with the Postal code compound type, that you manually created in Talend Dictionary Service. From now on, when importing new datasets containing postal codes, they will automatically be matched with the proper type.