You can create a compound semantic to group other semantic types that are
published on the Talend Dictionary Service server and
add it to the list of recognized data types.
You can mix all semantic types when creating a compound type, and a compound semantic type can reference other compound types on the condition that all children types are already published.
In this example you need to prepare a file containing information about customers from the
United States, the United Kingdom, Germany, and France. One of the columns in this dataset
contains postal codes from these different countries, and as a consequence, with different
formats. In this case, the application will apply the semantic type that matches the most
with the values in the column, US Postal code
for example. This will cause
the rest of the data, German, French, and British postal codes, to be considered
invalid.
To make the application more adapted to this situation, you will create a compound type, regrouping the several semantic types used to validate postal codes.
Before you begin
All the semantic types that you want to group under the compound type have been published.
Procedure
-
From the left panel of the homepage, open the Semantic type view.
-
Click the Add semantic type button.
-
In the Name field, enter
Postal code
.
-
In the Description field, enter
American, British, German and French postal codes
.
-
In the Type drop-down list, select Compound type.
-
Keep the Use for validation switch activated.
This compound type will be used to define which values are considered right or wrong when applied on a given column. The result of this validation process can be seen in the quality bar of each column in your datasets.
In this example, if you were to deactivate the switch, the compound type would only be used for data discovery, and no value would be considered invalid.
-
From the Children types drop-down list, select the semantic types you want to group under this
Postal code
compound type
-
Click Save and publish to send the new compound type to the
Talend Dictionary Service
server and make it available to the Talend Cloud Data Inventory
users.
Clicking Save as draft means that the semantic type will be
stored in Talend Dictionary Service, but will not be broadcast to the Talend Cloud
applications. This allows you to chose the moment when you want to make your
semantic types public.
The Postal code
type is now available in the list of semantic
types with the status set as Published.
The change in semantic types is instantly effective in Talend Cloud Data Inventory
for every new dataset that you create. For existing datasets, you will need to
refresh the sample in order to recalculate the quality with the new category that
is more suited.
-
Go back to your dataset containing the postal codes from several countries.
-
Click the Refresh sample button.
Results
Your data is now matched with the Postal code
compound type, that you
manually created in Talend Dictionary Service. From
now on, when importing new datasets containing postal codes, they will automatically be
matched with the proper type.