Best Practice: MDM Data Models

author
Irshad Burtally
EnrichVersion
6.4
6.3
6.2
6.1
EnrichProdName
Talend Data Fabric
Talend Open Studio for MDM
Talend MDM Platform
task
Data Governance > Modeling data
EnrichPlatform
Talend MDM Server
Talend Studio

Best Practice: MDM Data Models

This article describes the MDM Governance in general. It presents a non-exhaustive list of best practices when working with Talend MDM.

Data Models Naming conventions

Data Models must always have the same name as their related data containers.

Entities

Terminology

Entities must be named with the most generic terminology possible, in order to best describe the business object that is going to be mastered. This must ensure that all functional requirements are going to be covered. As a matter of fact, try to avoid naming using an application/source name (e.g. SAPEntity), or an organizational name (e.g. HREntity).

Naming conventions

Do not use any special character or number in the entity name.

Try to avoid using more than 20 characters for the entity name, not to have too long XPath queries in the MDM components thereafter.
  • Use upper CamelCase to improve the readability of the XPath queries.
  • Leave the entity type to anonymous.
  • Try to distinguish different kind of entities by using prefixes, as DataModel editor enables to filter entities based on a regular expression. Common types of entities can be :
    • Master entities (MST_.*)
    • Reference entities (REF_.*)
    • Cross-referencing entities (XREF_.*)

Elements

Terminology

As for entities, elements should be named in the most generic way, to comply with every business need related to this piece of master data.

Labels

Always set a label for the default language on an entity element, even if the label value is equal to the element name.

Security

All the elements of a datamodel should have at least the Write Access security annotation for System_Admin role. This enables the admin to have a full access on the whole datamodel.

Facets

When a facet is defined on the element type, a facet message should always be set to warn the user in case of an error. Every element having facet should have a documentation set, so that the user understands what constraint is set on the element, in the WebUI.

Naming conventions

Do not use any special character or number in the entity name.

Try to avoid using more than 20 characters for the entity name, not to have too long XPath queries in the MDM components thereafter.

Never rewrite the entity name in the element name (e.g. ProductLabel). This is not useful, and makes the XPath longer.

Use upper CamelCase to improve the readability of the XPath queries.

Primary Keys

Composite primary keys

Even though it's possible to define a composite primary key, this should be avoided to ease data management in the jobs. If the primary key is a composition of many elements, use a surrogate key instead.

Types

For presentation reasons, AUTO_INCREMENT should be preferred to UUID type.

Non-MDM controlled primary keys

Be really careful with non-MDM controlled primary keys. Using non-MDM controlled primary keys should be reserved for very static entities, when you are 100% sure the primary key value will never change. 

This can ease the update of the entity by having not to look up the primary key value, as you already know it. 

Naming conventions

Primary keys should be named using the following convention: Id + entity name (e.g. IdPerson). This will ease the element name retrieval in the Integration perspective, enabling the developer to guess it, and not to check constantly the datamodel.

Note: Using such a naming convention will enable you to write highly dynamic jobs, as you'll always know how to get the primary key name.

Foreign Keys

Foreign Key infos

A foreign key should always have a Foreign Key Info set on it. Indeed, the technical foreign key value may not bring business information to help the user choose a record in the foreign key pickers (except for entities having meaning primary keys).

Foreign Key filters

When it's required to use Foreign Key filters, the filtering element should always be above the filtered element, in the model. This will increase the chances of having a filter set, as user usually populate fields from the top to the bottom.

Breaking into separate tabs

By default, foreign entities shouldn't be broken into separate tabs, but only when there is an explicit business need. 

In most of the cases, only 1-to-many relations will need to be split into separate tabs.

Naming conventions

Foreign key should be named using the following convention: Fk + foreign entity name (e.g. FkStore).

This will ease the element name retrieval in the Integration perspective, enabling the developer to know that MDM.createFK() will be required, without having to check the datamodel.

Simple Custom Types

Reusability

Simple custom types should be created only when the type is used at least twice in the model. It should be used also when a change should be reproduced for every element referencing it.

Common reusable types are email, phone number for instance. For every other case, anonymous types should be leveraged instead.

Naming conventions

Simple custom types should be named using the following convention: type description + Type (e.g. EmailType).

When an enumeration facet is used on a simple type, the naming convention turns to be : enumeration description + Enum (e.g. GenderEnum).

Complex Custom Types

Naming Conventions

For repeatable complex types (e.g. 1..many), use the following naming convention : pluralized type description + List (e.g. ModulesList).

All complex custom types should be named.

Inheritance

Naming conventions

A supertype should be named using the following convention: type description + Specialization + (e.g. ContactSpecialization).

Then, types that inherit from the supertype should be named like : Supertype name + concrete specialization (e.g. ContactEmailSpecialization).

This enables to better understand how the type has been designed in the Reusable Data Types tab, making it more readable (e.g. ContactEmailSpecialization : ContactSpecialization)

Repeatable elements

When using Integrated Matching, the Swoosh algorithm works on table-like structures, not on trees (although this is not specific Swoosh because many other algorithms don't address this). This brings few limitations :

  • Cannot use repeatable elements
  • Cannot use an element when at least one of the parent element is a repeatable element.

Encapsulation

Repeatable elements should always be encapsulated in another element, having the same name pluralized.
<Person>
   <.../> 
   <Addresses> <!-- [0..1] - Encapsulating element -->
      <Address> <!-- [0..many] - Repeatable element -->
      </Address>
   </Addresses>
</Person>

Nesting

To ensure compatibility with the Partial Update feature, it is not advised to have more than one level of repeatable elements.

To make your datamodel less complex, try not to nest repeatable elements.

Multiple repeatable elements

To ease data management in the Integration perspective, it is advised not to have many repeatable elements on the same entity.