This section explains how to type simple elements.
Simple elements can use one of the default XML types:
Or one of the provided custom types:
For a complete list of types and the mapping to the physical storage, see MDM data model to RDBMS mapping.
By properly typing your simple elements, you define a set of validation rules to which your master data must comply.
A simple example: you have a date of birth field in an entity.
You could hold this date in a string field, but:
- There would be no validation that a correct date was entered at the MDM application
level. Of course, if data is ingested using ETL or services, the validation could be within
the integration logic.
- In the Talend MDM Web UI, you would not see
the date picker widget.
- The date would be stored as a string (VARCHAR) in the physical storage.
- You would not be able to perform date comparison searches.
The obvious thing to do here is type the date of birth field as a date type, which can of course be
formatted to display in your chosen format (see the MDM training). However, if you also
had unknown, N/A or default placeholder dates in your source data, these would not validate as
date values, with the possible exception of the placeholder dates. This will be discussed in
the Cardinality section.
The following example shows a surname field, which does not need to hold string data.
Initially, it would seem sensible to use the 'string' type within your model, but this
approach leads to a problem that is brought by the use of an RDBMS as your physical storage
You can see that in the example above, the default string type has been used, and in the
physical storage the x_last field is created as a VARCHAR(255). Here is what happens when you try to
enter data with more than 255 characters:
You get an error in the Talend MDM Web UI and a
database error in the Talend MDM Server log.
You can create a new simple type: 'string255_T', set a length restriction for this type, and
apply it to your 'last' element:
And then try your large string save test again:
Your 'string255_T' gives you a red box and exclamation mark to indicate an error before you
hit save, and a tooltip giving details of the error and a more user-friendly error if you click
Of course, you can use this principle to enforce length restrictions other than 255 characters, and you can see the effect this has on the physical storage.
Strings lengths greater than 255 characters are a special case.
The use of the MySQL LONGTEXT type is automatic for any value greater than 255. The
exception to this is Oracle, where a VARCHAR up to length 4000 is used. In most scenarios, it
is unlikely for anything that can be described as master data to have this many
Having examined string lengths, you can conclude that strings in a Talend MDM model should always be given a maximum length to:
- Enforce business rules as to the maximum length of a given field
- Give meaningful error messages
- Optimise the physical storage: there is no point in using 255 characters in the database if you
never populate more than 10 characters, this would be a waste of physical storage
When defining custom simple types, you can also add rules, known as facets, for the following:
- length: set a fixed length for the element
- minLength: minimum length
- pattern: the element has to conform to a regular expression pattern
- enumeration: fixed set of values, for more information, see the Rules section
- whitespace: preserve, replace or collapse
When setting a facet value, you may want to set a facet message to give a more
business-friendly tooltip in the user's language of choice.
Foreign key fields
A foreign key field should always be a string. An optional best practice is to set a length
restriction on an FK field. However, remember that an MDM foreign key is bounded by square
brackets,  for example, so any length restriction should take the two extra characters