tMDMOutput - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tMDMOutput receives data from the preceding component, generates an XML document, and then writes data in an MDM Hub using an output field.

Purpose

This component allows you to write data into or remove data from the MDM server.

tMDMOutput properties

Component family

Talend MDM

 

Basic settings

Property Type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: No property data stored centrally

 

 

Repository: Select the repository file where the properties are stored. The fields which follow are filled in automatically using the fetched data.

 

Input Schema and Edit schema

An input schema is a row description, it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

Click Sync columns to collect the schema from the previous component.

 

 

Built-in: You create the schema and store it locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: see Talend Studio User Guide.

Build the document

Select this check box if you want to build the document from a flat schema. If this is the case, double-click the component and map your schema in the dialog box that opens.

If the check box is not selected, you must select the column in your schema that contains the document from the Predefined XML document list.

 

Result of the XML serialization

Lists the name of the XML output column that will hold the XML data.

  Use an existing connection

Select this check box if you want to use a configured tMDMConnection component.

 

MDM version

By default, Server 6.0 is selected. Although it is recommended to migrate existing Jobs for this new version, the Server 5.6 option is available to ease the process of the migration of your Jobs so as to keep them working without modification with a 6.0 server. To do so, an option on the server must be enabled to accept and translate requests from such Jobs.

 

URL

Type in the URL of the MDM server.

 

Username and Password

Type in the user authentication data for the MDM server.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

Note

Ensure that the user has been assigned a role in Talend MDM enabling him or her to connect through a Job or any other web service call. For further information, see Talend Studio User Guide.

 

Data Model

Type in the name of the data model against which the data to be written is validated.

 

Data Container

Type in the name of the data container where you want to write the master data.

Note

This data container must already exist.

Type

Select Master or Staging to specify the database on which the action should be performed.

 

Return Keys

Columns corresponding to IDs in order: in sequential order, set the output columns that will store the return key values (primary keys) of the item(s) that will be created.

 

Is Update

Select this check box to update the modified fields.

If you leave this check box unchecked, all fields will be replaced by the modified ones.

 

Fire Create/Update event

Select this check box to add the actions carried out to a modification report.

Source Name: Between quotes, enter the name of the application to be used to carry out the modifications.

Enable verification by "before saving" process: Select this check box to verify the commit that has been just added; prior to saving.

 

Add task id

Select this check box to set an identifier for the task. The tMDMOutput component will write this ID on the MDM server, which provides a way of tracking the task.

  • Select Custom to specify your own choice of ID in the Task id field. Note that you must enter the ID between quotes.

  • Select Use a column from the schema to display a drop-down list from which you can select which column from the schema to use as the Task ID.

 

Use partial update

Select this check box if you need to update multi-occurrence elements (attributes) of an existing item (entity) based on the content of a source XML stream.

Once selected, you need to set the parameters presented below:

- Pivot: type in the xpath to the multi-occurrences sub-element where data need to be added, replaced or deleted in the item of interest.

For example, if you need to add a child sub-element to the below existing item:

<Person>
    <Id>1</Id> <!-- record key is 
     mandatory -->
    <Children>
        <Child>[1234]</Child> 
     <!-- FK to a Person Entity -->
    </Children>
</Person>

then the Xpath you enter in this Pivot field must read as follows: /Person/Children/Child where the Overwrite check box is cleared.

And, if you need to replace a child sub-element in an existing item:

<Person>
  <Id>1</Id> 
  <Addresses>
    <Address>
      <Type>office</Type>
        (...address elements 
         are here....)
    </Address>
    <Address>
      <Type>home</Type>
        (...address elements 
         are here....)
    </Address>
  <Addresses>
</Person>

then the Xpath you enter in this Pivot field must read as follows: /Person/Addresses/Address where the Overwrite check box is selected, and the Key field is set to /Type .

In such an example, assuming the item in MDM only has an office address, the office address will be replaced, and the home address will be added.

- Overwrite: select this check box if you need to replace or update the original sub-elements with the input sub-elements. Leave unselected if you want to add a sub-element.

- Key: type in the xpath relative to the pivot that will help match a sub-element of the source XML with a sub-element of the item. If a key is not supplied, all sub-elements of an item with an XPath matching that of the sub-element of the source XML will be replaced. If more than one sub-element matches the key, MDM will update the first one it encounters. If no sub-elements match the key, it is added at the end of the collection.

-Position: type in a number to indicate the position after which the new elements (those that do not match the key) will be added. If you do not provide a value in this field, the new element will be added at the end.

- Delete: select this check box if you need to remove one or more sub-elements from the original sub-elements.

For example, if you need to remove two houses from the existing item below:

<Person>
	<Id>1</Id>
	<Name>p1</Name>
	<Houses>
		<House>[1]</House>
		<House>[2]</House>
		<House>[3]</House>
	</Houses>
	<Children>
		<Child>
			<Name>k1</Name>
			<Age>1</Age>
			<Habits>
				<Habit>Basketball</Habit>
				<Habit>Football</Habit>
				<Habit>Tennis</Habit>
				<Habit>Boxing</Habit>
			</Habits>
		</Child>
		<Child>
			<Name>k2</Name>
			<Age>2</Age>
		</Child>
	</Children>
</Person>

then the Xpath you enter in this Pivot field must read as follows: /Person/Houses/House where the Delete check box is selected, and the Key field is set to . or empty. Moreover, you need to provide the source XML stream as follows:

<Person>
	<Id>1</Id>
	<Houses>
		<House>[1]</House>
		<House>[2]</House>
	</Houses>
</Person>

In this case, the House [1] and House [2] will be deleted.

For more examples of the partial update operations, see Examples of partial update operations using tMDMOutput.

 

Die on error

Select this check box to skip the row in error and complete the process for error-free rows. If needed, you can retrieve the rows in error via a Row > Rejects link.

Advanced settings

Extended Output

Select this check box to commit master data in batches. You can specify the number of lines per batch in the Rows to commit field.

 

Configure Xml Tree

Opens the interface which helps create the XML structure of the master data you want to write.

 

Group by

Select the column to be used to regroup the master data.

 

Create empty element if needed

This check box is selected by default. If the content of the interface's Related Column which enables creation of the XML structure is null, or if no column is associated with the XML node, this option creates an opening and closing tag at the required places.

 

Advanced separator (for number)

Select this check box to modify the number of separators used by default.

- Thousands separator: enter between inverted commas the separator for thousands.

- Decimal separator: enter between inverted commas the decimal separator.

 

Generation mode

Select the appropriate generation mode according to your memory availability. The available modes are:

  • Slow and memory-consuming (Dom4j)

    Note

    This option allows you to use dom4j to process the XML files of high complexity.

  • Fast with low memory consumption

 

Encoding

Select the encoding type from the list or else select Custom and define it manually. This is an obligatory field for the manipulation of data on the server.

 

tStatCatcher Statistics

Select this check box to gather the processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

NB_LINE: the number of rows processed. This is an After variable and it returns an integer.

NB_LINE_REJECTED: the number of rows rejected. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Use this component to write a data record and separate the fields using a specific separator.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Scenario: Writing master data in an MDM hub

This scenario describes a two-component Job that generates a data record, transforms it into XML and loads it into the defined business entity in the MDM server.

In this example, we want to load a new agency in the Agency business entity. This new agency has an id, a name and three offices located in different cities.

For more information about entities, see Talend Studio User Guide.

Dropping and linking the components

  1. From the Palette, drop tFixedFlowInput and tMDMOutput onto the design workspace.

  2. Connect the components using a Row Main link.

Configuring the components

Preparing the data to be loaded into the MDM server

  1. Double-click tFixedFlowInput to view its Basic settings in the Component tab.

  2. In the Schema list, select Built-In and then click the three-dot button next to Edit schema to open a dialog box in which you can define the structure of the master data you want to write on the MDM server.

  3. Click the [+] button and add five columns of the type String.

    In this example, name the columns Id, Name, Office_R_and_D, Office_Sales, and Office_Services.

  4. Click OK to validate your changes.

  5. In the Number of rows field, enter the number of rows you want to generate.

  6. In the Mode area, select the Use Single Table option.

  7. In the Value fields, enter between quotes the values which correspond to each of the schema columns.

Basic settings of tMDMOutput

  1. In the design workspace, click tMDMOutput to open its Basic settings view.

  2. In the Input Schema list, select Built-In and then click the [...] button next to the Edit Schema field to define the structure of the master data you want to load into the MDM server.

    The tMDMOutput component basically generates an XML document, writes it in an output field, and then sends it to the MDM server.

  3. Click OK to proceed to the next step.

    The Result of the XML serialization list in the Basic settings view is automatically filled in with the output xml column.

  4. In the URL field, enter the URL to access the MDM server.

  5. In the Username and Password fields, enter the authentication information required to connect to the MDM server.

  6. In the Data Model field, enter between quotes the name of the data model against which you want to validate the master data you want to write.

  7. In the Data Container, enter between quotes the name of the data container into which you want to write the master data.

  8. Select the Is Update checkbox if you only want to update some fields rather than the entire data record.

    Note

    If you want to use this component to write on the MDM server a task resolved in Talend Data Stewardship Console, select the Add task id check box, and then enter the ID of your choice or select the schema column you want to use as the ID.

    In this case, the MDM record on the MDM server will have a track back to the resolved task fetched from Talend Data Stewardship Console. For further information, see Talend Data Stewardship Console User Guide.

Advanced settings of tMDMOutput

  1. In the Component view, click Advanced settings to set the advanced parameters for the tMDMOutput component.

  2. Select the Extended Output check box if you want to commit master data in batches, and specify the number of lines per batch in the Rows to commit field.

    Click the [...] next to Configure XML Tree to open the tMDMOutput editor.

    Alternatively, double-click tMDMOutput to open the editor.

  3. In the Link target area to the right, click in the XML Tree field and then replace rootTag with the name of the business entity into which you want to insert the data record, Agency in this example.

  4. In the Linker source area, select the two schema columns Id and Name and drop them on the Agency node respectively.

    The [Selection] dialog box is displayed.

    Select the Create as sub-element of target node option so that the two columns are linked to the two XML sub-elements of the Agency node.

  5. Right-click the root node Agency and then select Add Sub-element.

    In the dialog box that pops up, enter a name for the new sub-element, Offices in this example.

    Repeat the same procedure to create three new sub-elements Office for the Offices node which corresponds to the multi-occurence element Offices of the business entity Agency.

  6. In the Linker source area, select the three schema columns Office_R_and_D, Office_Sales and Office_Services and drop them on the three new Office nodes respectively.

    The [Selection] dialog box is displayed.

    Select the Create as sub-element of target node option so that the three columns are linked to the three XML sub-elements of the Offices node.

  7. Click Ok to proceed to the next step.

  8. Right-click the element in the Link Target area you want to set as a loop element and select Set As Loop Element from the contextual menu.

    In this example, Id is the iterating object.

  9. Click OK to validate your changes and close the dialog box.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or clicking Run on the Run tab.

    The new data record is inserted in the Agency business entity in the DStar data container on the MDM server. This data records holds, as you defined in the schema, the agency id, the agency name and the agency offices located in three cities.

Removing master data partially from the MDM hub

The scenario describes how to partially remove the master data which has been written into the MDM server in the scenario Scenario: Writing master data in an MDM hub.