In this example, you create a data model to determine the structure of the data to be managed in the Site deduplication campaign. This campaign enables data stewards to label near duplicates in a data sample extracted by a Talend Job.
Talend Data Stewardship has data model awareness which makes possible the syntactic and semantic validation of data. You can define the attributes in the data model and select their types out of a predefined standard or semantic types.
Procedure
- Select Data models > Add data model.
-
Enter a name and a description for the new model in the
Name and Description
fields respectively. Optional fields are marked with * next to
their names.
-
In the Attributes
section, define the columns you want to have in the data model as the
following:
- In the Identifier field, enter the technical identifier for the first column.
-
Enter a name and a description for the column in the
corresponding fields, if needed.
What you set in the Name field is the name displayed in the task list. If no name is set, the technical identifier will be displayed.
-
From the attribute type list, select the type of the
column.
Standard and semantic types are integrated in the application by default.
- For the standard types, additional fields are
displayed according to the type you select. These fields are
optional and they enable you to define some constraints on the
attribute you define such as defining a minimum and/or maximum
length or defining a pattern against which to validate the
attribute.
To make sure the entire value matches your validation pattern, it is best practice to surround the validation pattern with
^
and$
.Some examples:-
[A-Z]
matchesA
andABC
. -
^[A-Z]$
matchesA
but does not matchABC
.
For
Date
andTimestamp
columns, you have access to a date and time picker which helps you set the date and time automatically in the right format. -
- For the semantic types, you can use the Talend Dictionary Service to manage the semantic types. However, the availability of this service depends on the license you have.
- For the standard types, additional fields are
displayed according to the type you select. These fields are
optional and they enable you to define some constraints on the
attribute you define such as defining a minimum and/or maximum
length or defining a pattern against which to validate the
attribute.
- Optionally, toggle the Allow empty values option to disable the upload of empty fields. This option is enabled by default.
-
Click Add attribute
and repeat the above steps to create all the columns you need in the data
model.
The columns defined for the Site deduplication campaign used in this example hold information about childhood education centers in Chicago.