MDM data model to RDBMS mapping

author
François Huaulme
EnrichVersion
6.4
6.3
6.2
6.1
6.0
EnrichProdName
Talend MDM Platform
Talend Data Fabric
task
Data Governance > Modeling data
EnrichPlatform
Talend MDM Server

MDM data model to RDBMS mapping

This article describes how the MDM data model is mapped to a RDBMS database schema. It starts from a simple model that gets enriched to illustrate mapping features.
Environment

This article applies to MDM Server from version 5.2 onwards.

BasicsColumn names

This first section introduces basic concepts with the data model to database schema mapping.

Consider the following XML:

<Person>
	<id>1</id>
	<lastname>Smith</lastname>
	<firstname>John</firstname>
</Person>

The following database schema is created:

Person
x_id
x_lastname
x_firstname

This mapping is straightforward: all elements contained in the Person entity type are mapped to a column in a table named "Person". You may notice the "x_" prefix before each column name: adding this prefix ensures no column name conflicts with SQL reserved keywords ("count", "select"...). If you take a closer look at the generated table, you may notice additional columns:

Person
x_id
x_lastname
x_firstname
x_talend_taskid
x_talend_timestamp

Each column prefixed by "x_talend_" is a technical column not exposed to the end user: they don't exist in the user's data model. They are internally used to store task id (for the Data Stewardship Console) and the last modification timestamp. Although these values are not exposed to user, they are still used (for queries on the last modification time, for instance).

Column names might be shortened depending on the database. MDM has the following maximum length:

Database

Maximum length for SQL names

(column and table names)

Oracle 30
MySQL 64
SQL Server 128
Postgres 63
H2 No limit
Column typing

Each column of the table has a type that depends on the element type in the data model.

<Person>
	<id>1</id> <!-- xsd:integer -->
	<lastname>Smith</lastname> <!-- xsd:string --> 
	<firstname>John</firstname> <!-- xsd:string -->
</Person>

The column types are:

Person Type
x_id int
x_lastname varchar(255)
x_firstname varchar(255)
x_talend_taskid varchar(255)
x_talend_timestamp bigint(20)

Note: Technical fields follow the same typing rules as user-defined elements.

Here's the complete mapping for element type to SQL type:

Xml schema type SQL column type
string varchar(255)
boolean bit(1)
float float
double double
decimal decimal(19,2)
duration varchar(255)
dateTime datetime
time datetime
date datetime
hexBinary varchar(255)
base64Binary varchar(255)
anyURI varchar(255)
QName varchar(255)
integer int(11)
nonPositiveInteger int(11)
negativeInteger int(11)
long bigint(20)
int int(11)
short smallint(6)
byte tinyint(4)
nonNegativeInteger int(11)
unsignedLong bigint(20)
unsignedInt int(11)
unsignedShort smallint(6)
unsignedByte tinyint(4)
positiveInteger int(11)
Column constraints

Depending on data model definition, database mapping may enforce NOT NULL values. For instance:

<Person>
	<id>1</id> <!-- minOccurs=1, maxOccurs=1 -->
	<lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> 
	<firstname>John</firstname> <!-- minOccurs=0, maxOccurs=1 -->
</Person>

We can see only the "id" element is mandatory (i.e. minOccurs=1).

Starting from version 5.2, all "id" elements (all elements included in entity key) must be minOccurs="1".

No Person record can be created without a value in <id>: this is expressed with a NOT NULL constraint on the x_id column.

Person Type Constraint
x_id int NOT NULL
x_lastname varchar(255)  
x_firstname varchar(255)  
x_talend_taskid varchar(255)  
x_talend_timestamp bigint(20) NOT NULL

Note: Technical fields follow the same rules for NOT NULL constraints as user-defined elements.

Column length

It is possible to configure the length of elements based on string type. You can use restrictions on xsd:string in your data model:

<xsd:simpleType name="limitedString_15">
	<xsd:restriction base="xsd:string">
		<xsd:maxLength value="15"/>
	</xsd:restriction>
</xsd:simpleType>

(in this example, "limitedString_15" is a simple type that inherits from xsd:string but has a maximum length of 15 characters).

You may then use this simple type in your entity:

<xsd:element name="Product">
	<xsd:complexType>
		<xsd:sequence maxOccurs="1" minOccurs="1">
                	<xsd:element maxOccurs="1" minOccurs="1" name="Name" type="limitedString_15">
		</xsd:sequence>
	</xsd:complexType>
</xsd:element>

(in this example, the length value of "Name" must be less than or equal to 15 characters).

MDM will generate this database schema:

Product Type Constraint
x_id int NOT NULL
x_name varchar(15)  

If maximum length exceeds 255 characters, MDM will automatically switch to a CLOB type for all databases but Oracle. When using Oracle, MDM will create VARCHAR columns with a maximum size of 4000.

Repeatable elements mapping

Let's now consider that the entity Person may have multiple firstnames (an infinity of firstnames). An XML record would then be:

<Person>
	<id>1</id> <!-- minOccurs=1, maxOccurs=1 -->
	<lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> 
	<firstname>John</firstname> <!-- minOccurs=0, maxOccurs=unbounded -->
	<firstname>William</firstname> <!-- minOccurs=0, maxOccurs=unbounded -->
</Person>

Note #1 : In most cases, firstname element would be enclosed in a "firstnames" element. This scenario and impacts on database mapping is exposed in next section.

Note #2 : Foreign keys are also handled differently; this is described later in this article.

In this situation, two tables will be created: one to store the repeated elements, and one for the other elements.

Person
x_id
x_lastname
Person_x_firstname
x_id
value
pos

The table " Person_x_firstname" represents all the repeated element values. "x_id" in this table points to "x_id" of Person and "pos" indicates the position of the value in the value list (this is a reverse mapping). Since XML elements are ordered (and order matters to the user: he/she doesn't want to store "John" then "William" and retrieve the values in another order).

Consider the values actually stored in the database:

Person

x_id x_lastname
1 Smith
2 Doe

Person_x_firstname

x_id value pos
1 John 0
1 William 1
2 John 0

In the example above, Person id #1 has 2 firstnames ("John", "Willliam") and Person id #2 only has 1 firstname ("John").

Note: It becomes very easy to add an element at the end or rearrange elements within the list: you only need to modify the "pos" column.

Hierarchy mapping with 0..1

Starting from previous step where "firstname" became a repeatable element, let's now assume all "firstname" are wrapped in a unique XML element named "firstnames".

<Person>
	<id>1</id> <!-- minOccurs=1, maxOccurs=1 -->
	<lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> 
	<firstnames> <!-- minOccurs=0, maxOccurs=1 -->
		<firstname>John</firstname> <!-- minOccurs=0, maxOccurs=unbounded -->
		<firstname>William</firstname> <!-- minOccurs=0, maxOccurs=unbounded -->
	</firstnames>
</Person>

In this case (where "firstnames" is maxOccurs=1), there's no big change in the database mapping:

Person
x_id
x_lastname
Person_x_firstnames_firstname
x_id
value
pos

The table name changed from " Person_x_firstname" to "Person_x_firstnames_firstname". Since firstnames is maxOccurs=1, there's no need for an intermediate table.

Hierarchy mapping with 0..n

This section now describes what happens if a wrapper element is itself repeatable. Let's take this example:

<Person>
	<id>1</id> <!-- minOccurs=1, maxOccurs=1 -->
	<lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> 
	<firstnames> <!-- minOccurs=0, maxOccurs=1 -->
		<firstname>John</firstname> <!-- minOccurs=0, maxOccurs=unbounded -->
		<firstname>William</firstname> <!-- minOccurs=0, maxOccurs=unbounded -->
	</firstnames>
	<details> <!-- minOccurs=0, maxOccurs=unbounded -->
		<name>Allergies</name>
		<value>peanuts</name>
	</details>
	<details>
		<name>Beverage</name>
		<value>coffee</value>
	</details>
</Person>

When MDM detects a repeatable element that isn't a simple element (an element that contains "string", "int"), it automatically switches to a different mapping. This mapping is internally called "scattered" (the one applied before is called "flat"). When switching to scattered, all sub elements will have their own table.

Person
x_id
x_lastname
x_ details
X_Anonymous0
x_talend_id
x_name
x_value
Foreign key mapping 0..1

Let's now consider Person has an FK to a Group entity. A Person may or may not linked to a Group. From an XML perspective, this would give:

<Person>
	<id>1</id> <!-- minOccurs=1, maxOccurs=1 -->
	<lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> 
	<group>[1]</group> <!-- minOccurs=0, maxOccurs=1 --> 
</Person>

In this case, this Person instance references a Group instance with id "1". This creates a simple FK column in the table.

Person
x_id
x_lastname
x_ group
Group
x_id
x_name

The column "x_group" of the table "Person" is declared as a FK to the column "x_id" of the "Group" table. Please note that NOT NULL constraints also applies to the FK columns: if minOccurs >= 1 for "group" element, a NOT NULL constraint will also be added.

Note that when using version 5.2 and SQL, the next XML is exactly the same:

<Person>
	<id>1</id> <!-- minOccurs=1, maxOccurs=1 -->
	<lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> 
	<group>1</group> <!-- minOccurs=0, maxOccurs=1 --> 
</Person>

The square brackets can be omitted on insertion.

Note:

  • Once you read back the instance Person, MDM will put the square brackets back.
  • If 'group' is a composite key (i.e. key of entity "Group" is composed of > 1 element), this would raise an exception.
Foreign key type and referenced PK

It is possible to use any type for the key elements on an entity but the FK element must be a xsd:string .

Consider this example:

<Person>
	<id>1</id> 
	<lastname>Smith</lastname>
	<group>[1]</group>
</Person>

If you have this record, it becomes clear that the <group> FK element must be a string (it contains square brackets). But the Group entity could use an xsd:int for id.

MDM throws an exception if the content does not match the PK type, so this record will cause an exception when you try to save it:

<Person>
	<id>1</id>
	<lastname>Smith</lastname>
	<group>[abc]</group>
</Person>
Foreign key mapping 0..n and n..m

If Person has multiple FK to Address entity, the XML of the record would be:

<Person>
	<id>1</id> <!-- minOccurs=1, maxOccurs=1 -->
	<lastname>Smith</lastname> <!-- minOccurs=0, maxOccurs=1 --> 
	<address>[1]</address> <!-- minOccurs=0, maxOccurs=unbounded --> 
	<address>[2]</address> <!-- minOccurs=0, maxOccurs=unbounded --> 
</Person>

In this example, this Person has 2 FK to Address (one to Address with id 1, another to id 2). The generated database tables are:

Person
x_id
x_lastname
Person_2_Address
x_id
x_address_id
Address
x_address_id
x_street
x_city

The table "Person_2_Address" will store all relationships from Person to Address. In this case, we didn't use a reverse FK because:

  • Address might be referenced by other entities from the data model.
  • We keep the reverse FK for containment relationships. Since Address is an entity, it exists without a Person instance.

At the time of writing, no test ensures correct behavior when Address and Person share the same PK column name (Address PK column would be x_id).

Inheritance mappingEntity inheritance

One Entity can inherit from another in data models created from MDM Studio. This simply handled through a "per concrete type" strategy (one table per concrete type of the inheritance tree). This means there's no effect on the mapping. If you have type A and type B and B inherits from A, you will have two tables "A" and "B". The only difference you may notice is the "browse_items_A" view would return both instances of A and B (this is not what XML database is doing).

From version 5.2 onwards, a sub entity can not redefine the key from its super type (B cannot add a new ID field, since ID is declared in the top level entity and - once declared - cannot be changed by sub entities).

About record editing and views

Inheritance behaves differently when you use SQL storage. In a view for an entity that has sub types, instances of sub type will appear. If you have an Employee type that inherits from Person, both Person and Employee will appear in the Person view.

You can also edit values of a record from its super type view: if Person declares a field "name", the value of "name" for an Employee of this field can be edited from either the Person view or the Employee view.

About Foreign Key Picker

The Foreign Key Picker now shows all possible candidates for relation: if the FK points to type "Person", the Foreign Key Pcker will show instances from "Person" and "Employee" (a "Employee" is a "Person" due to the inheritance relationship).

Reusable type inheritance

Reusable types may inherit from each other (actually, this is the only type of inheritance that MDM Studio lets you define). In this case, a "per hierarchy" strategy is used: this means there will be a single table that will contain the union of all elements defined in the inheritance tree.

Let's take this sample data model (this is not a table model):

A
x_value_a
B
x_value_b
C
x_value_c

In this example, both B and C inherit from A. In the database, this would create a table "A":

X_A
x_talend_id
x_talend_class
x_value_a
x_value_b
x_value_c

There are some details worth noticing:

  • A technical ID has been created (this is the "x_talend_id" column that contains an auto-generated UUID not intended to be exposed to the end user).
  • A "discriminator" column has also been created (this is the "x_talend_class" column), that will help MDM to know what is the actual type of the reusable type instance.
  • A "X_" prefix was added: this is done to differentiate reusable types from entity types (a data model may define a entity "A" and a reusable type "A": this is legal and supported).