You can create a compound semantic to group other semantic types that are published on the Talend Dictionary Service server and add it to the list of recognized data types in Talend Data Preparation.
You can mix all semantic types when creating a compound type, and a compound semantic type can reference other compound types on the condition that all children types are already published.
In this example you need to prepare a file containing information about customers from
the United States, the United Kingdom, Germany and France. One of the columns in this
dataset contains postal codes from these different countries, and as a consequence, with
different formats. In this case, Talend Data Preparation will apply the semantic
type that matches the most with the values in the column,
code for example. This will cause the rest of the data, German, French and
British postal codes, to be considered invalid.
To make Talend Data Preparation more adapted to this situation, you will create a compound type, regrouping the several semantic types used to validate postal codes.
Before you begin
All the semantic types that you want to group under the compound type have been published.
- Open the Semantic types view from the left panel of the Talend Data Preparation homepage and click Add semantic type.
- In the Name field, enter Postal code.
- In the Description field, enter American, British, German and French postal codes.
- In the Type drop-down list, select Compound type.
Keep the Use for validation switch activated.
This compound type will be used to define which values are considered right or wrong when applied on a given column. The result of this validation process can be seen in the quality bar of each column in your datasets.
In this example, if you were to deactivate the switch, the compound type would only be used for data discovery, and no value would be considered invalid.
From the Children types drop-down list, select the
semantic types you want to group under this
Postal codecompound type.
Click Save and publish to send the new compound type to
the Talend Dictionary Service
server and make it available to the Talend Data Preparation users.
Clicking Save as draft means that the semantic type will be stored in Talend Dictionary Service, but will not be broadcast to the Talend Web applications. This allows you to chose the moment when you want to make your semantic types public.
Postal codetype is now available in the list of semantic types with the status set as Published.
The change in semantic types is instantly effective in Talend Data Preparation for every new dataset that you import. For existing datasets, you need to manually change the column type or reimport your dataset.
- Go back to your dataset containing the postal codes from several countries.
- Click the menu icon in the header of the column containing the postal codes and select .
Your data is now matched with the
Postal code compound type, that
you manually created in Talend Dictionary Service. From now on, when
importing new datasets containing postal codes, they will automatically be matched
with the proper type.