Processes - 6.3

Talend MDM Platform Studio User Guide

EnrichVersion
6.3
EnrichProdName
Talend MDM Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

A Process defines multiple steps to achieve a data validation or human validation process, a transformation process, an enrichment process, a data integration process, and so on. Each step of a Process can use a plug-in, which is a mechanism to add specific capabilities to the process to perform a single task. The list of available plug-ins is extensible by J2EE developers. The steps defined in each Process and the plug-ins used will differ according to the task you want to accomplish.

One of the plug-ins available in any Talend MDM platform is the callJob plug-in, which invokes a Talend Job exposed as a Web Service. For further information on Process types and associated plug-ins, see Process types and Important plug-ins.

Parameters to set when you define a Process include:

  • Process name,

  • Process description,

  • sequence of steps: the list of all plug-ins included in the Process to be executed, one after the other,

  • step specifications: a Process plug-in consumes data in input parameters and produces a result in the output parameters. You must define variables to hold the result of a step. Then you send the variable to the input of the next step, and so forth. Eventually, you define a "pipeline" where each step result is chained to the next step through a variable.

    A plug-in may have multiple input variables and parameters, as well as multiple output variables and parameters.

When you design a Process, you combine specific Process plug-ins in a sequence of steps, which are executed one after the other to perform specific tasks.

For each step, do the following:

  1. Choose the appropriate plug-in.

  2. Enter or select an input variable and an input parameter.

  3. Select the output parameter and select or enter the output variable.

Note

It is possible to disable one or more steps at any time if you select the Disabled check box in the Step Specification area for the selected step.

For a step-by-step procedure on creating a Process, see How to create a Process from scratch.

Process types

When creating a new Process from Talend Studio, you can select one of the proposed Process types: Before Saving/Deleting, Entity/Welcome Action, Smart View, or Other.

The table below describes the different Process types:

Process type

Description

Associated plug-in

Other

Any type other than the listed ones. This process is usually executed after an event occurs on master data in the MDM Hub.

Any in the list of available plug-ins.

Before Saving

A Process that validates master data according to certain conditions before saving it in the MDM Hub. This Process can be linked with a Job designed in the Integration perspective to perform the validation operation automatically.

This Process can alter the MDM record before it is committed in the database. For instance, a Job may be run that completes the record with calculations and/or enrichments.

Note

The naming of this Process follows a specific pattern: beforeSaving_<entity>.

Note also that, for improved readability, in the MDM Repository view, Processes are stored in subfolders by type and only the second part of the name is displayed.

Any in the list of available plug-ins. If this Process is linked with a Talend Job, it uses the callJob plug-in. This plug-in executes a call to the Job created in the Integration perspective to evaluate the data to be saved and returns an error message if validation of this data is necessary.

For further information, see How to set schema for a Before Saving/Deleting Job.

Before Deleting

A Process that evaluates master data according to certain conditions before deleting it from the MDM Hub. This Process can be linked with a Job designed in the Integration perspective to perform the operation automatically.

Note

The naming of this Process follows a specific pattern: beforedeleting_<object name>.

Note also that, for improved readability, in the MDM Repository view, Processes are stored in subfolders by type and only the second part of the name is displayed.

Any in the list of available plug-ins. If this Process is linked with a Talend Job, it uses the callJob plug-in. This plug-in executes a call to the Job created in the Integration perspective to evaluate data before deleting it and forbid the change by an error message.

For further information, see How to set schema for a Before Saving/Deleting Job.

Smart View

An XSLT-based Process that is automatically detected by Talend MDM Web User Interface.

It sets up a more customized graphical presentation for a given data object (hiding some fields, displaying icons, etc.). The business user may choose to display or print the object with this read-only personalized view or to switch to the usual generated view where edits are possible.

Note

The naming of this Process follows a specific pattern: Smart_view_<entity>_[<ISO2>][<#name>], where the 2 character language ISO code and the name suffix are optional. <ISO2> allows you to define multilingual smart views, and the <#name> suffix allows you to have several alternates smart views of the same entity.

This Process name will automatically fallback to Smart_view_<entity> if the language is not found, and the language you use to define the Smart View HTML parameters in the Smart View editor will be picked as the default language.

Note that, for improved readability, in the MDM Repository view, Processes are stored in subfolders by type and only the second part of the name is displayed.

For further information, see How to create a Smart View Process.

Usually XSLT which transforms an XML document using XSLT.

But you can choose any number of steps using any plug-in, as long as the result at the end is HTML.

Entity Action

A Process that is created in the Studio and automatically listed in the Data Browser page in Talend MDM Web User Interface. A business user can then select any of these Processes listed in the Web User Interface and click the Launch Process button to start the selected Process. For further information, see Talend MDM Web User Interface User Guide.

This Process is always linked to a specific entity. You can design this Process to do any tasks you want, for example sending the entity by email or launching a workflow to do certain modifications on master data pertaining to the entity to which the process is attached.

Note

The naming of this Process follows a specific pattern: Runnable_<entity>. However, if you want to customize the Process name, you must specify an Optional Name in the [Create Process] wizard. This adds a hash sign before the word you want to add to the Process name. For example, giving the Optional Name Send to an Entity Action Process on the Agency entity would give the Process name Runnable_Agency#Send.

Note also that, for improved readability, in the MDM Repository view Processes are stored in subfolders by type and only the second part of the name is displayed.

For further information, see How to create an Entity Action Process.

Any in the list of available plug-ins. However, typical associated plug-ins are callJob and workflowcontexttrigger if the Process is linked with a Talend Job or a workflow.

Welcome Action

Similar to the Runnable Process but is not linked to a specific entity.

This Process is created in the Studio and automatically listed in the [Welcome] page in Talend MDM Web User Interface.

You can design this Process to do any task you want, for example adding a new record/entity or launching a workflow to do certain modifications or synchronization on master data.

Note

The naming of this Process follows a specific pattern: Runnable#<name>; for example Runnable#AddNewRecord.

Note also that, for improved readability, in the MDM Repository view, Processes are stored in subfolders by type and only the second part of the name is displayed.

For further information, see How to create a Welcome Action Process

Any in the list of available plug-ins. However, typical associated plug-ins are callJob and workflowcontexttrigger if the Process is linked with a Talend Job or a workflow.

Important plug-ins

Plug-ins are extra components that add specific capabilities to the Talend MDM. Talend Studio proposes a list of plug-ins to be combined with a given Process. These plug-ins include callJob, groovy, and xslt.

The table below explains some of the plug-ins listed in the Studio and details their parameters.

Plug-in

Action

Description

callJob

Executes a Talend Job on master data (to modify or propagate it, for example).

For further information on the schemas used, see Schemas used in MDM processes to call Jobs.

This plug-in executes a Web service call to the server where the Web service is deployed, usually the MDM server.

Parameters:

url: the webservice port URL.

Name: name of the input variable.

Value: value of the input variable.

Note

If you want to view the related Job, click the Open Job button to open it in the Integration perspective.

groovy

Calls the groovy script and uses it to process and transform data.

This plug-in implements all the capabilities of the groovy script to process and transform data when it receives an Update Report. It can read the XML document, transform data and write in the XML document as well.

workflowtrigger

Passes an item to Talend MDM workflow engine.

This plug-in executes the process created in the BPM perspective in Talend Studio.

Parameters:

-processId: the Id of the process designed in the BPM perspective in Talend Studio.

Warning

Make sure to enter the complete processId otherwise you will have an error message indicating that the process cannot be found. You can display the complete processId in the General area if you click in the white rectangle that represents the process pool in the BPM perspective. An example of a complete processId is (AgencyRating_1_0_AgencyRating).

-processVersion: the version of the process.

-username: the user name for accessing the BPM server.

-password: the password for accessing the BPM server.

-variable(s): the variables which will be used in the workflow. They are

scope: the scope of the variable, process or activity.

activity Id: if the scope is equal to activity.

name: the name of the variable defined in the workflow process or activity.

type: the type of the variable, you can choose String, Boolean or others.

fromItem: is the value that comes from a part of an item, true or false.

xpath: if the fromItem is equal to true.

Warning

Make sure to enter the variable xpath to enable the workflowtrigger plug-in to send the complete variables parameters to the process instance. Otherwise, a processing error message is displayed.

value: if the fromItem is equal to false For information on creating and managing workflows, see Workflows.

xslt

Transforms an XML document using XSLT.

This plug-in implements xslt transformations on an input XML document. It supports XSLT 2.0 and is enriched with cross-referencing capabilities: specific instructions that help to perform on the fly cross-referencing on any master data stored in the MDM Hub. When the output method of the XSLT is set to xml or to xhtml. Cross-referencing is carried out after the XSLT is processed on all elements with the following attributes:

<MyElement
     xrefCluster='CLUSTER'
     xrefIn='TEST1, ..., TESTN'
     xrefOut='XPATH_IN_ITEM'
     xrefIgnore='true|false'
     xrefDefault='DEFAULT_VALUE'
>OLD_VALUE</MyElement>

Below is a definition of each of these attributes:

-xrefCluster: the container (cluster) where the items used for cross-referencing are stored.

-xrefIn: a series of XPaths tests to match this item content with a remote item.

-xrefOut: the XPath in the remote item, starting with the entity (concept) name, of the content that will replace the content of this item.

-xrefIgnore: optional, defaults to false. If set to true, the cross referencing will not fail if no item is found and the xrefDefault value will be inserted.

-xrefDefault: if xrefIgnore is set to true and the cross-referencing fails, this value will be used instead.

Input variables:

-xml: the xml on which to apply the XSLT.

-parameters: optional input parameters to the XSLT in the form of:

<Parameters>
   <Parameter>
       <Name>PARAMETER_NAME</Name>
       <Value>PARAMETER_VALUE</Value>
   </Parameter>
</Parameters>

Output variables:

- text: the result of the XSLT.

For an example on this plug-in, see Example of the xslt plug-in.

partialupdate

Performs partial updates on an item.

The partialupdate plugin updates elements of an existing item from the content of a supplied XML This plugin provides the ability to:

-add sub elements or update existing elements,

-add sub elements to an existing list of sub-elements starting from a specified position.

Input variables:

-xml-instance: the XML used to find and update an existing item. The updated item is searched based on the XML content, and the XML must follow certain specifications:

First, the root element must have the same name as the name of the Entity of the item.

Second, the XML must contain the value of all the item keys at the same XPath as those of the item unless item_primary_key is specified.

Third, other than the keys, the XML can contain more elements than the one updated on the item but does not have to validate the item data model.

-item_primary_key: optional if the key values are set on the xml_instance. The primary key must be supplied as an object of type application/xtentis.itempk as returned by the project item plugin.

-data_model: (optional) the Data Model used to validate the item after update. Overwrites the corresponding value supplied in the parameters.

-clear_cache: optional, defaults to false. If set to true, the Data Model is re-read and parsed from the database for each invocation of the plugin during the Process execution

Output variables:

-item_primary_key: the primary key of the updated item as an object of type application/xtentis.itempk.

Example of the xslt plug-in

The following example parameters will loop over all the lines of the input XML and send them to the transformer as XML fragments:

<Country
    xrefCluster='MYCLUSTER'
    xrefIn='.=Country/Codes/ISO2, ../Customer/Name=[ACME]'
    xrefOut='Country/Name/FR'
><xsl:value-of select='State/CountryCode'/></Country>

The example above does the following:

  • The XSLT generates a <Country> element in the target document,

  • The content of State/CountryCode of the source document is inserted as the value of the element,

  • The rest of the xslt transformations complete,

  • The system queries the Country data in cluster MYCLUSTER where:Codes/ISO2Code is equal to State/CountryCode (the current value of the Country element), and /Customer/Name in the target document is equal to hard coded value ACME,

  • The matching Country document is returned and the value in Name/FR is extracted,

  • The value in Country of the target document is replaced with the extracted value.

Schemas used in MDM processes to call Jobs

When a Job is called from an MDM process, it receives an XML document based on a specific schema. In return, the Job sends back a document which must also conform to a particular schema.

How to set the schema for a Job called through a Trigger

This is the typical case when a Process is called by a Trigger. The Process uses a callJob plug-in to invoke a Talend Job created in the Integration perspective of Talend Studio.

Input Schema A document is passed on to the Job. The schema is:

<item>
         ... record ...
</item>

Assuming a Customer record, the complete result is:

<item>
        <Customer>
                <Firstname>Janet</Firstname>
                <Lastname>Richards</Lastname>
        </Customer>
</item>

Output schema If the Job returns nothing, MDM will generate a document with the Job return status in callJob output variable:

<results>
        <item>
                <attr>0=ok or 1=failed</attr>
        </item>
</results>

If the Job returns a table though a tBufferOutput component, MDM will define the following document in the callJob output variable:

<results>

     <item>
             <attr>col1</attr>
             <attr>col2</attr>
             etc.
     </item>

</results>

This result may be mapped back into an Entity by adding the following fragment in callJob configuration:

<configuration>
(...)
   <conceptMapping>
         <concept>Customer</concept>
         <fields>
           {
           p0:Firstname,
           p1:Lastname,
           }
        </fields>
   </conceptMapping>
</configuration>

Then callJobs output variable will receive:

<results>
        <Customer>
                <Firstname>col1</Firstname>
                <Lastname>col2</Lastname>
        </Customer>
</results>

How to set schema for a Before Saving/Deleting Job

The Before Saving/Before Deleting processes are called directly by naming convention. They do not go through the usual Trigger > Process mechanism. Jobs called through a Before Saving or Before Deleting Process receive a different document than when they are called through a Trigger. In addition, they are expected to return a status report or an error message which the Web Interface can use to proceed with / cancel the action.

Note

The process must always return a variable called output_report.

Input

The input document comprises the update report as well as the record which is being saved or deleted:

<exchange>
      <report>
      ... update report ...
      </report>
      <item>
      ... record ...
      </item>
</exchange>

Note

You can always find the exact schema description of an update report in the MDM Repository tree view in Data Model > System > UpdateReport.

In the Job, you may put conditions similar to triggers. For instance, you may use exchange/report/Update/OperationType to implement different conditions on CREATE and UPDATE.

Output

The Job is required to return a document that conforms to:

<report><message type="info">message</message></report>

Or to:

<report><message type="error">message</message></report>

Note

When you want to create a Before Saving Process that both checks validation rules and completes the record on the fly, you must define a two-step Process, with one step returning output_item and the other step returning output_report.

The working principles for the Before Saving and Before Deleting Processes can be summarized as described in the following three cases.

Upon completion of the Before Saving or Before Deleting processes, the MDM server looks for a variable called output_report in the Process pipeline.

First case:

  • If <report><message type="info">message</message></report>: the validation process of the data record has been carried out successfully and a message will be displayed. The data record will be successfully saved with the Before Saving Process, or successfully deleted with the Before Deleting Process.

  • If <report><message type="error">message</message></report>: the validation process of the data record fails and a message is displayed. The data record will not be saved with the Before Saving Process, and it will not be deleted with the Before Deleting Process.

Second case:

The MDM server has not found the output_report variable. The validation process of the data record has failed and an error message will be shown to confirm this. The data record will not be saved with the Before Saving Process, and it will not be deleted with the Before Deleting Process.

Third case:

The Process throws an exception (typically one of the steps in the Process leads to a technical error: wrong configuration, XSLT syntax error, Job not found or could not be called, and so on.). A technical error message will be displayed and the data record will not be saved with the Before Saving Process, and it will not be deleted with the Before Deleting Process.

How to set up a callJob Process chain using the Create Process wizard

The [Create Process] wizard takes you through the generation of the complete callJob Process chain for each of the following types of Process: Before Saving or Before Deleting, Entity Action or Welcome Action, and Other. For a description of each type of Process, see Process types.

The steps to follow vary depending on the type of Process being generated.

Setting up a callJob Process chain for a Before Process

To set up the callJob Process chain for a Before Saving Process or a Before Deleting Process using the [Create Process] wizard, do the following:

  1. In the MDM Repository tree view, expand the Event Management node, right-click the Process node, and then click New.

    The [Create Process] wizard opens.

  2. Select which type of Process you want to create, and then click Next.

    The next screen of the [Create Process] wizard opens.

  3. Select whether you want to create a Before Saving Process or a Before Deleting Process, and then click the [...] button next to the Input Name field.

    The [Select one Entity] dialog box opens.

  4. Select the specific Entity for which you want to generate the Process, and then click Add to return to the [Create Process] wizard.

  5. Click Next.

    A dialog box opens in which you can input a multi-lingual message to accompany your Process.

  6. Define the error or informational message you want to display, as follows:

    1. Select the Message Type, error or info.

    2. Click the [...] button next to the Message field to open a dialog box in which you write the message and, if appropriate, a localized version in one or more additional languages.

    3. Click OK to close the [Set multi-lingual message] dialog box and return to the [Create Process] wizard.

  7. Click Next.

  8. Select or deselect the Generate the template job check box to specify whether you want to generate a template Job for the process, and then click Finish.

    The Process and, if appropriate, the template Job open.

Setting up a callJob Process chain for an Entity Action or Welcome Action Process

To set up the callJob Process chain for an Entity Action Process or a Welcome Action Process using the [Create Process] wizard, do the following:

  1. In the MDM Repository tree view, expand the Event Management node, right-click the Process node, and then click New.

    The [Create Process] wizard opens.

  2. Select which type of Process you want to create, and then click Next.

    The next screen of the [Create Process] wizard opens.

  3. Select whether you want to create an Entity Action Process or a Welcome Action Process.

    An Entity Action Process is linked to a specific Entity (which you define in the next step of the wizard). A Welcome Action Process appears as a standalone link in the Welcome page of the Talend MDM Web User Interface.

  4. Click the [...] button next to the Description field.

    The [Set multi-lingual message] dialog box opens.

  5. Enter the text for the English-language label to accompany your Process, and then click the [+] button.

    Note

    You can also define localized versions of the label for other languages if required.

  6. Click OK to return to the [Create Process] wizard.

  7. Click the [...] button next to the Entity field.

    The [Select one Entity] dialog box opens.

  8. Select the Entity for which you want to generate the Process, and then click Add to return to the [Create Process] wizard.

    Note

    If you are creating a Welcome Action Process, skip the Entity selection step and manually enter a name for your Process in the Optional Name field instead.

  9. Click Next.

    The next screen of the [Create Process] wizard opens.

  10. Select the Enable redirection check box if you want the Process to redirect the Web browser to a URL, and specify the URL in the URL field.

  11. Click Next.

  12. Select or deselect the Generate the template job check box to specify whether you want to generate a template Job for the process, and then click Finish.

    The Process and, if appropriate, the template Job opens.

Setting up a callJob Process chain for an Other Process

To set up the callJob Process chain for an Other Process using the [Create Process] wizard, do the following:

  1. In the MDM Repository tree view, expand the Event Management node, right-click the Process node, and then click New.

    The [Create Process] wizard opens.

  2. Select which type of Process you want to create, and then click Next.

    The next screen of the [Create Process] wizard opens.

  3. Enter a name for your Process in the Input Name field, and then click Next.

  4. Select or deselect the Generate the template job check box to specify whether you want to generate a template job for the process, and then click Finish.

    The Process and, if appropriate, the template Job open.

How to create a Process from scratch

When you design a Process, you combine specific Process plug-ins into a sequence of steps. These steps are then executed one after the other to perform specific tasks.

Whenever a data record is created/updated/deleted, the MDM Server generates a document and lists it under the UpdateReport node in the System data container in the MDM Repository tree view. This document describes the event in answering who, what and when questions and in giving the record primary key and the values before and after in case of an update. This UpdateReport document does contain everything about the event that just happened. However, it does not contain the complete XML record that was created/updated/deleted. This document is then sent to the Event Manager. Whenever the Event Manager receives a document, it tries to evaluate every Trigger condition against this document. For further information about Triggers, see Triggers.

The sequence of events that occur whenever a Create/Update/Delete (CRUD) is performed in Talend MDM is as follows:

  • the Event Manager evaluates every defined Trigger to see if one or more Triggers have valid conditions,

  • the services defined in the Trigger which has valid conditions are performed,

  • if a callJob service has been defined in the Trigger, the Trigger uses the callJob plug-in to run a Talend Job.

In the example below, a data model called Product has been created in Talend Studio. This data model has two business entities: Product and ProductFamily. Several attributes have been created in the Product entity including Price and Family. You want to create a Process to automatically trigger a validation Job whenever a price of an item that belongs to a specific family has been changed through Talend MDM Web User Interface.

Prerequisite(s): You have already connected to the MDM server from Talend Studio. You have the appropriate user authorization to create Processes.

To create a Process to automatically trigger a Talend Job, do the following:

  1. In the MDM Repository tree view, expand Event Management and then right-click Process and select New from the contextual menu.

    The [Create Process] dialog box is displayed.

  2. Select the option corresponding to the Process type you want to create, and click Next.

  3. Enter a name for the new Process.

    In this example, you want to create an Other Process named Call_Job. For more information on Process types, see Process types.

  4. Click OK to close the dialog box.

    An empty editor for the newly created Process opens in the workspace.

  5. If required, click the three-dot button next to the Description field to open a dialog box where you can set multilingual descriptions of your Process.

  6. In the Step Description field, enter a name for the first step you want to define in the created Process and then click the icon to add the step name in the rectangle below the field.

  7. Repeat to add the two other steps included in this Process.