Best Practice: MDM Jobs

EnrichVersion
6.4
6.3
6.2
6.1
6.0
5.6
EnrichProdName
Talend Data Fabric
Talend Open Studio for MDM
Talend MDM Platform
task
Design and Development > Third-party systems > MDM components
Data Quality and Preparation > Third-party systems > MDM components
Data Governance > Third-party systems > MDM components
EnrichPlatform
Talend Studio
Talend MDM Server

Best Practice: MDM Jobs

This article presents the best practices to use when designing MDM Jobs.

Context groups and context variables

The following context variables are often used in a MDM project, and you're likely to put it all in a MDMServer context group:
Context Variable Type Description Example value
context.MDM_Protocol String Protocol used to access the MDM API http
context.MDM_Host String Hostname for the MDM Server localhost
context.MDM_Port Int Port used to reach the MDM 8180
context.MDM_Username String Generic user used in MDM components administrator
context.MDM_Password String Related password administrator
context.MDM_Datamodel String Name for the Data Model Product
context.MDM_Datacontainer String Name for the Data Container Product
context.MDM_Version String Version  
context.MDM_Extendedinsert Int Number of objects in each MDM read batch 200
context.MDM_Fetchsize Int Number of objects in each MDM write batch 100
context.MDM_Bulksize Int Number of objects in each MDM bulk batch 10000

MDM Components

tMDMConnection

Unify the connections to Talend MDM using a tMDMConnection component

Using tMDMConnection in a job is a good way to have a single point for configuring the MDM connection in a job.

This enables not to forget contextualization of other tMDM* components in a job, and hence to reduce the chances of having a misconfigured tMDM* component in a job.

As a matter of fact, all tMDM* components should have the option Use an existing connection ticked, and a tMDMClose component should be used at the end.

tMDMInput

Columns that aren’t used shouldn’t be gathered from tMDMInput

Gathering data from the MDM has a cost. One tMDMInput should only gather the required data. This will enable to reduce the data payload, hence increasing the tMDMInput throughput.

It also optimizes memory usage in the job. You get better performance in your job this way.

Moreover, your jobs are easier to maintain, and your tMaps will become more easily readable.

tMDMOutput

Track the data lifecycle using Fire a Create/Update event

Tick Fire a Create/Update event in tMDMOutput when appending the journal is needed. Most of the time, tracking the data lifecycle is an important requirement in an MDM project. Using the java variable jobName can be a convenient way to track which job is at the origin of an insert/update, enabling the end-users to find this information in the MDM journal.

Tick tMDMOutputs Extended Output option and contextualize the value

This will enable fine tuning of job performance. As a matter of fact, you'll be able to tune the performance depending on whether you're running the job on a development environment, or on a production environment. This enables the Talend Administration Center admin to fine-tune the performance of MDM-related jobs.

Note: There are some cases where setting a fixed value is a better option (eg. When the number of rows to write is known before).

Be coherent with tMDMOutput encoding type

When using tMDMOutput as a virtual component (with the Build the document option ticked), tMDMOutput uses tWriteXMLField's default encoding. Yet, Talend MDM is fully UTF-8.

You should therefore make sure that your data integration jobs will output UTF-8 data to the MDM Server, by setting the right encoding in the component.

Note: You should also make sure your SQL database storing the MDM data is fully UTF-8. In the case of Microsoft SQL Server 2008 R2, the MDM tables must be altered manually to change all varchar columns into nvarchar to support the UTF-8 encoding for certain languages like Chinese.

tMDMSP

Avoid the use of tMDMSP when you can

tMDMSP, and the use of Stored Procedures in general should be avoided.

For portability reasons, only use SP when what you want to do isn't possible at all with the standard components, because Stored Procedures may make your job dependent to a particular persistence layer.

Hence, your Stored Procedures (and maybe the jobs that are calling it) will need to be rewritten when migrating from one persistence layer to another.

Before processes

Before processes (beforeSaving, beforeDeleting) are mainly used with the callJob plugin. This means that the Talend Job that is going to be invoked needs to follow a special pattern to be fully compliant with Talend MDM, and ensure future maintainability of the external validation.

DON’T:

Before Talend v5.X, the common practice was to use a tMDMReceive component in order to get the <exchange> message coming from the MDM Server, and a tBufferOutput to send back the <report> message.

This approach is now deprecated, and should be avoided, as it is complex to use (XPath) and harder to maintain.

DO:

tMDMTriggerInput and tMDMTriggerOutput components should be leveraged now instead.

Note:

Every beforeProcess must send back a <report> message. Always make sure that for every case possible in your job, you'll send the needed <report> message.

Otherwise, if no <report> message is sent, the MDM Server is going to veto the save. This is maybe not what you expected from the Job.

Best Practices – Routines

Whenever it is possible, MDM routines should be used, to enhance readability and future maintainability of the job. This is a good way to hide technical complexity (XML, Foreign Keys mangling) to Data Integration developers.

Foreign keys columns

DON’T:

Do not concatenate square brackets to set up Foreign Keys mangling in Talend MDM.

DO:

Use the routine MDM.createFK() to mangle the Foreign Key data.
Note: It is a datamodel design best practice to indicate that the column contains a foreign key by suffixing its name with Fk. This enables not to forget to mangle the data in the Data Integration perspective.

Before-saving/Before-deleting return values

DON’T:

Do not write raw, technical XML for a before-saving or before-deleting return value. This is technical, hard to read, and doesn't ensure the future maintainability of this return value.

DO:

Use the routine MDM.createReturnMessage() to build the return message value. This abstracts the XML message used in the column, and makes the component much more readable.

Documentation & Labels

As for any other kind of Talend job, documentation is crucial to quickly understand what the process is about. It should be possible to get the maximum of details without having to dig into each component setup.

These are some examples on how you can document the MDM components, sorted from poor to well-documented:
No documentation

This is the standard component, out of the palette. You should avoid having components in your jobs with the standard tMDMInput label, like this one.

Here, it is impossible to guess what entity you're going to read. One can't really understand the logic set in the component.

As a matter of fact, someone that doesn't know the job will have to open the component and read the whole Basic Settings and Advanced Settings to get an idea of what the component is about here, which can be a loss of time, and a hassle when it comes to jobs maintenance.

Poor

This is the default label when you drag and drop a component from the Talend MDM repository metadata to the job.

This gives a first level of information, by showing the name of the entity you're using.

It can be enough if the input isn't filtered, and if it's a very straightforward use of the component.

Fair

This kind of label is interesting when you setup some options in the MDM component (eg. filters in a tMDMInput, partial update in a tMDMOutput).

This gives the developer a rapid, efficient way to understand the component logic in this job.

No need to open the Parameters tab to understand here, everything is explained in the label.

Well-documented (with tooltip)

In some cases, it can be interesting to tick the Show information checkbox, in the Documentation pane, to add a tooltip to the component.

This is mainly used when you need more room to explain a complex logic set in the component, or to add side-notes.

There is no need to set it on every MDM component though. Just keep it only for special or more complex cases when you need a top-notch documentation.