Skip to main content Skip to complementary content

Relational vs Hierarchical vs Object Orientated

Availability-noteDeprecated
This section describes different modelling approaches.

As discussed in the Talend MDM modelling training course, there is no requirement to model in third normal form (3NF) in Talend MDM. You can model at a level of abstraction from the physical storage (RDBMS), at a level that suits your requirements in terms of business processes that need to be enacted upon your master or reference data. In the previous section (Loops within loops), you saw that a purely hierarchical approach is also not desirable in many scenarios. As well as being able to model relationally and hierarchically, you can also model in Talend MDM using Object Orientation principles. Some models will combine all three modelling approaches. There is no right answer as to how to model something, but there are some guiding principles:

Leverage relational modelling where it is beneficial to your processes

With many to many relationships, for example if you have two entities : 'customer' and 'address', a customer might have many addresses and an address might be related to multiple customers. By modelling relationally – two entities with foreign key relationships – you can leverage the value of mastering the relationship. In this example, there could be value to being able to do a simple query that establishes that two customers have the same address. If 'address' was part of the 'customer' entity, you would have to query based on the values of your address (or some sort of hash key). The address would be held twice within MDM and you would be matching as part of a search process. If you model relationally, the two 'customer' entities would be related to the same 'address' entity. You can achieve this by cleansing and standardising your addresses with an external address validation tool (QAS, Logate, etc.) and then matching on 'address' as part of your load processes to ensure that duplicates are not inserted. Thus, the query is a simple "give me all the Customers with a foreign key relationship to Address X".

This principle can also be extended to reference data. Take for example the ISO country code example discussed earlier in this document. If you use country codes as part of an address, you can have a simple string(3) field to hold the code. However, if you also want to have the descriptive name, cross references to other source systems, a relationship between country and language, etc., putting all this information into the address entity is needlessly complex and a wasteful repetition of data. It is also error prone, as there is no validation of the ISO data as part of the model. Instead, it is much better to have an 'isoCountry' entity, or the reusable reference data entity discussed later in this document. By doing so, data that would be repeated in the denormalised approach is now held in one single record that can have governance processes designed for it. Changing the data in this single record automatically propagates the change to related records by virtue of the relationship. The additional country data can even be displayed in the Talend MDM Web UI, in the 'address' entity, using Foreign Key Infos. From the perspective of the 'address' entity, invalid 'country' data is now impossible to enter. In the Talend MDM Web UI, this manifests as a picklist for the user to select the relationship. Free-form text is not allowed.

Use hierarchical modelling to remove the need for join tables

While the previous section argues for the use of relationships where there is value to your MDM processes to do so, it is not an argument for a fully normalized model. In fact, a normalized approach can:

  • Increase integration complexity
  • Decrease performance substantially by having to insert into many entities at the same time to represent the master data of a single 'business' entity
  • Make the use of the Talend MDM Web UI a less optimal user experience, or even lead to orphan records and data inconsistency

A classic example of this is the many to many scenario discussed in the previous section. While there is value in having the relationship explicitly defined, modelling this structure in a flat, normalized approach would require three entities in your model: 'customer', 'address', and 'customerAddress' (to hold a record per relationship that exists between the two entities).

Instead, it is usually better to collapse the join table 'customerAddress' into one of the main entities as a repeating list of relationships. In this specific case, the list is normally collapsed into 'customer', which contains an element 'addressList' with a list of 0...many (or cardinality of your choice) address relationships. The complex type used to hold the relationship can also hold elements that provide context about the relationship, for example whether an address is a mailing or shipping address.

In the integration layer, it is very simple to perform the "give me all customers with this address" query. But if you were looking at the data in the Talend MDM Web UI, you may think that it would not be possible to do this without the join table. However, there are two approaches to this:

  • Browse to the address record you are interested in and click the Relations button, this allows you to reverse-engineer the relationships, even though the address record itself does not define any relationships.
  • Set up your MDM view searchable parameters for the customer view to allow searching by an address. The picker will allow this search based on both the technical key and data values you configure as Foreign Key Infos.

Finally, when dealing with repeating lists of values, you need to consider the following:

  • Do you need to update or delete values in the list? If so, you need a way of uniquely identifying an entry in the list. If the list does not naturally have a unique key, you may need to add a surrogate key to facilitate these types of actions. As mentioned earlier in this document, AUTO_INCREMENT and UUID types can be used for this, or perhaps a unique external key if the insert action is not allowed via the Talend MDM Web UI.
  • Do you need to cross reference entries in the list back to uniquely identifiable records in one or more source systems? If so, how will you hold these cross-references without introducing nested loops? This is a relatively rare scenario, but it does occur when a source system requires you to provide it with an ID to update or delete something, as opposed to:
    • Providing the full parent 'customer' record
    • Looking up the ID to update in the source system on the fly
    • Giving a before and after data instruction such as "Change: email: a@bc.com to ab@c.com"
    How you deal with updates to repeating list elements varies greatly depending on the business process, but it is important to consider how these updates will be done when designing the model.

Use Inheritance only where it is valuable

Conceptually, the idea of using inheritance in an MDM model can be appealing. Consider a simple example with addresses:

Here you see the Address base type holding the information that is common to all addresses. You also have UK and US-specific precisions on the base address that add country-specific fields. However, consider that:

  • Having to determine the specialization type in the integration layer at runtime adds considerable complexity to the integration logic.
  • You could actually achieve the same effect through relational modelling. Although there would be a small performance hit, the logic would be simpler.
  • It might actually be better to define a global address standard into which all types of address can be input. This is often the best approach of all, as it means that an organization-wide consensus on structure and process has been reached and the integration is simple and efficient.

Inheritance does have its place in Talend MDM, and this section will be expanded in a future release.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!