Skip to main content Skip to complementary content

MDM ID Generation

Availability-noteDeprecated
This section describes the different ways to generate IDs with Talend MDM.

Every record in Talend MDM requires a unique primary key – unique within a given entity. Compound keys are not recommended. As a general best practice, this unique primary key ID should be generated by Talend MDM for master data records and can be optionally generated by Talend MDM for reference data, depending on the business processes surrounding the reference data and the model design. This is because generation of IDs by Talend MDM addresses the following issues:

  • Concurrency: with Talend MDM generating the keys, it is impossible for two records to be given the same key, even in high concurrency, real-time scenarios.
  • Generation of an organization-wide unique MDM identifier: an MDM record could potentially consist of data from records in source systems A, B and C, possibly from multiple records in each system. Therefore, it makes little sense to use the identifier generated from a single system as the primary key, especially as the life of the MDM record may persist beyond the life of that single source record. Generally, you will also master cross references to these source records, linked to your unique MDM ID, but it makes sense to be able to uniquely and consistently identify an entity that is critical to doing business by an ID that is recognised and understood by the whole organization, not just a single system at a single point in time.

Some reference data may have a unique key managed by a trusted external source. For example, if you need to hold ISO country information in your MDM hub and you decide to adopt the ISO three character format as your enterprise standard, you can trust that the ISO organization will not change the data to introduce a duplicate code, when compared to previous releases. Therefore, it may be appropriate to use the ISO three character code as your primary key in MDM, especially in this case, where the rate of change of the data is very slow.

Talend MDM provides two different simple types for ID generation: AUTO_INCREMENT and UUID.

AUTO_INCREMENT

AUTO_INCREMENT is a sequence number which increments by one every time a record is created and starts at one by default. The base type is a string and must not be changed.

The counter's current values are stored in the CONF container:

They can be reset automatically using a Job or service. Alternatively, they can be reset manually using Talend Studio, with the Manage AutoIncrements button on the data container browser.

Obviously, if they are reset to a value, that means the next generated ID will conflict with a record already existing in MDM and an overwrite will occur.

AutoIncrement IDs are not guaranteed to be complete or sequential. If you were to create records 1, 2 and 3 and then delete record 2, the next generated ID would not be 2, and it is not guaranteed to be 4, although it usually would be.

Information noteNote: The AUTO_INCREMENT type is not recommended for use in entities that require high load performance in incremental batch Jobs or real-time services.

Specifically, it should not be used:

  • When the performance requirements (volume, speed) of a batch process dictate that multiple write threads should be used concurrently via the Talend MDM components (MDM SOAP API underneath) or via the MDM REST API.
  • When using Talend ESB to create master data services that are required to be called by many concurrent clients during peak system usage.
  • When the use of the MDM REST API is required in high concurrency/volume scenarios.

Typically, the definitions above cover the core master entities in a model. Thus, the best practice is usually to use UUID and not AUTO_INCREMENT for master data entities.

UUID

UUIDs, or Universally Unique IDentifiers, in their canonical form, are represented by 32 hexadecimal digits displayed in five groups separated by hyphens, in the form 8-4-4-4-12, for a total of 36 characters. For example:

09bf989f-5b24-47bc-871e-1e824d4f4c60

According to Wikipedia, the chances of a UUID collision (generating an already existing UUID) are incredibly low:

"Only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. Or, to put it another way, the probability of one duplicate would be about 50% if every person on earth owned 600 million UUIDs.”

UUIDs have several advantages over AutoIncrement IDs:

  • No need to use Hazelcast to orchestrate key generation in an MDM cluster
  • IDs can be easily generated outside of MDM as well as inside MDM if needed (typically done when inserting via the bulk loader in an initial load process)
  • No need to reset counters when resetting the hub back to a known state (for example: an empty initial state, commonly done in testing)
Information noteNote: It is also possible to use the AUTO_INCREMENT and UUID types for non-primary key fields. This can be useful in certain circumstances, for example to allocate a unique ID to an entry in a repeating list of values.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!