Best Practice: MDM Jobs
Context groups and context variables
Context Variable | Type | Description | Example value |
---|---|---|---|
context.MDM_Protocol | String | Protocol used to access the MDM API | http |
context.MDM_Host | String | Hostname for the MDM Server | localhost |
context.MDM_Port | Int | Port used to reach the MDM | 8180 |
context.MDM_Username | String | Generic user used in MDM components | administrator |
context.MDM_Password | String | Related password | administrator |
context.MDM_Datamodel | String | Name for the Data Model | Product |
context.MDM_Datacontainer | String | Name for the Data Container | Product |
context.MDM_Version | String | Version | |
context.MDM_Extendedinsert | Int | Number of objects in each MDM read batch | 200 |
context.MDM_Fetchsize | Int | Number of objects in each MDM write batch | 100 |
context.MDM_Bulksize | Int | Number of objects in each MDM bulk batch | 10000 |
MDM Components
tMDMConnection
Unify the connections to Talend MDM using a tMDMConnection component
Using tMDMConnection in a job is a good way to have a single point for configuring the MDM connection in a job.
This enables not to forget contextualization of other tMDM* components in a job, and hence to reduce the chances of having a misconfigured tMDM* component in a job.
tMDMInput
Columns that aren’t used shouldn’t be gathered from tMDMInput
Gathering data from the MDM has a cost. One tMDMInput should only gather the required data. This will enable to reduce the data payload, hence increasing the tMDMInput throughput.
It also optimizes memory usage in the job. You get better performance in your job this way.
tMDMOutput
Track the data lifecycle using Fire a Create/Update event
Tick Fire a Create/Update event in tMDMOutput when appending the journal is needed. Most of the time, tracking the data lifecycle is an important requirement in an MDM project. Using the java variable jobName can be a convenient way to track which job is at the origin of an insert/update, enabling the end-users to find this information in the MDM journal.
Tick tMDMOutputs Extended Output option and contextualize the value
This will enable fine tuning of job performance. As a matter of fact, you'll be able to tune the performance depending on whether you're running the job on a development environment, or on a production environment. This enables the Talend Administration Center admin to fine-tune the performance of MDM-related jobs.
Be coherent with tMDMOutput encoding type
When using tMDMOutput as a virtual component (with the Build the document option ticked), tMDMOutput uses tWriteXMLField's default encoding. Yet, Talend MDM is fully UTF-8.
You should therefore make sure that your data integration jobs will output UTF-8 data to the Talend MDM Server, by setting the right encoding in the component.
tMDMSP
Avoid the use of tMDMSP when you can
tMDMSP, and the use of Stored Procedures in general should be avoided.
For portability reasons, only use SP when what you want to do isn't possible at all with the standard components, because Stored Procedures may make your job dependent to a particular persistence layer.
Hence, your Stored Procedures (and maybe the jobs that are calling it) will need to be rewritten when migrating from one persistence layer to another.
Before processes
Before processes (beforeSaving, beforeDeleting) are mainly used with the callJob plugin. This means that the Talend Job that is going to be invoked needs to follow a special pattern to be fully compliant with Talend MDM, and ensure future maintainability of the external validation.
DON’T:
Before Talend v5.X, the common practice was to use a tMDMReceive component in order to get the <exchange> message coming from the Talend MDM Server, and a tBufferOutput to send back the <report> message.
This approach is now deprecated, and should be avoided, as it is complex to use (XPath) and harder to maintain.
DO:
tMDMTriggerInput and tMDMTriggerOutput components should be leveraged now instead.
Every beforeProcess must send back a <report> message. Always make sure that for every case possible in your job, you'll send the needed <report> message.
Otherwise, if no <report> message is sent, the Talend MDM Server is going to veto the save. This is maybe not what you expected from the Job.
Best Practices – Routines
Foreign keys columns
DON’T:
DO:
Before-saving/Before-deleting return values
DON’T:
Do not write raw, technical XML for a before-saving or before-deleting return value. This is technical, hard to read, and doesn't ensure the future maintainability of this return value.
DO:
Use the routine MDM.createReturnMessage() to build the return message value. This abstracts the XML message used in the column, and makes the component much more readable.
Documentation & Labels
As for any other kind of Talend job, documentation is crucial to quickly understand what the process is about. It should be possible to get the maximum of details without having to dig into each component setup.
No documentation |
|
This is the standard component, out of the palette. You should avoid having components in your jobs with the standard tMDMInput label, like this one. Here, it is impossible to guess what entity you're going to read. One can't really understand the logic set in the component. As a matter of fact, someone that doesn't know the job will have to open the component and read the whole Basic Settings and Advanced Settings to get an idea of what the component is about here, which can be a loss of time, and a hassle when it comes to jobs maintenance. |
Poor |
|
This is the default label when you drag and drop a component from the Talend MDM repository metadata to the job. This gives a first level of information, by showing the name of the entity you're using. It can be enough if the input isn't filtered, and if it's a very straightforward use of the component. |
Fair |
|
This kind of label is interesting when you setup some options in the MDM component (eg. filters in a tMDMInput, partial update in a tMDMOutput). This gives the developer a rapid, efficient way to understand the component logic in this job. No need to open the Parameters tab to understand here, everything is explained in the label. |
Well-documented (with tooltip) |
|
In some cases, it can be interesting to tick the Show information checkbox, in the Documentation pane, to add a tooltip to the component. This is mainly used when you need more room to explain a complex logic set in the component, or to add side-notes. There is no need to set it on every MDM component though. Just keep it only for special or more complex cases when you need a top-notch documentation. |