tMDMBulkLoad - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tMDMBulkLoad writes XML structured master data into the MDM hub in bulk mode.

Purpose

This component uses bulk mode to write data so that big batches of data or data of high complexity can be quickly uploaded onto the MDM server.

tMDMBulkLoad properties

Component family

Talend MDM

 

Basic settings

Schema and Edit Schema

A schema is a row description, it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

Click Sync columns to collect the schema from the previous component.

 

 

Built-in: You create the schema and store it locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.

 

XML field

Select the name of the column in which you want to write the XML data.

 

URL

Type in the URL required to access the MDM server.

 

Username and Password

Type in the user authentication data for the MDM server.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

 

Data Model

Type in the name of the data model against which the data to be written is validated.

 

Data Container

Type in the name of the data container where you want to write the master data.

 

Entity

Type in the name of the entity that holds the data record(s) you want to write.

Type

Select Master or Staging to specify the database on which the action should be performed.

 

Validate

Select this check box to validate the data you want to write onto the MDM server against validation rules defined for the current data model.

Note that for the PROVISIONING Data Container, validation checks will always be performed on incoming records, regardless of whether or not this check box is selected.

For more information on how to set the validation rules, see Talend Studio User Guide.

Warning

If you need faster loading performance, do not select this check box.

 

Generate ID

Select this check box to generate an ID number for all of the data written.

Warning

If you need faster loading performance, do not select this check box.

Insert only

Select this check box to skip the step of checking whether the data records to be inserted already exist on the MDM server, thus achieving a better performance.

However, before using this option, you need to make sure that the data records do not exist in the database.

 

Commit size

Type in the row count of each batch to be written onto the MDM server.

 

Use Transaction

Select this check box then, in the Component List, click an existing connection component which will be used to commit the transaction.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the processing metadata at the Job level as well as at each component level.

Connections

Outgoing links (from this component to another):

Row: Main,

Trigger: Run if; On Component Ok; On Component Error, On Subjob Ok, On Subjob Error.

Incoming links (from one component to this one):

Row: Main

Trigger: Run if, On Component Ok, On Component Error, On Subjob Ok, On Subjob Error

For further information regarding connections, see Talend Studio User Guide.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component needs always an incoming link to offer XML structured data. If your data offered is not yet in the XML structure, you need use components like tWriteXMLField to transform this data into the XML structure. For further information about tWriteXMLField, see tWriteXMLField.

If you use a Job with the component tMDMBulkLoad to bulk load large volumes of data into MDM, you can tune the bulk load operation by adding a specific JVM argument (for example, -Dbulkload.concurrent.http.requests=25) in the Advanced settings tab of the Job to limit the maximum number of concurrent requests sent to the MDM server. This avoids consuming all available Tomcat application server connections, which will lead to transaction and deadlock issues.

Scenario: Loading records into a business entity

This scenario describes a Job that loads records into the ProductFamily business entity defined by a specific data model in the MDM hub.

Prerequisites:

  • The Product data container: This data container is used to separate the product master data domain from the other master data domains.

  • The Product data model: This data model is used to define the attributes, validation rules, user access rights and relationships of the entities of interest. Thus it defines the attributes of the ProductFamily business entity.

  • The ProductFamily business entity: This business entity contains Id, Name, both defined by the Product data model.

For further information about how to create a data container, a data model, and a business entity along with its attributes, see the MDM part of your Talend StudioMDM Studio User Guide.

The Job in this scenario uses three components.

  • tFixedFlowInput: This component generates the records to be loaded into the ProductFamily business entity. In a real-life project, your records to be loaded are often voluminous and stored in a specific file. However, to simplify the replication of this scenario, this Job uses tFixedFlowInput to generate four sample records.

  • tWriteXMLField: This component transforms the incoming data into XML structure.

  • tMDMBulkLoad: This component writes the incoming data into the ProductFamily business entity in bulk mode, generating ID value for each of the record data.

Dropping and linking components

  1. Drop tFixedFlowInput, tWriteXMLField and tMDMBulkLoad onto the design workspace.

  2. Connect tFixedFlowInput to tWriteXMLField using the Main link.

  3. Do the same to connect tWriteXMLField to tMDMBulkLoad.

Configuring the components

Generating the data records to be loaded into a business entity

  1. Double click tFixedFlowInput to open its Basic settings view.

  2. Click the [...] button next to Edit schema to open the schema editor.

  3. In the schema editor, click the [+] button to add one row.

  4. Name the new column, family in this example.

  5. Click OK to close the schema editor.

  6. In the Mode area of the Basic settings view, select the Use Inline Table option.

  7. Click the [+] button four times to add four rows in the table.

  8. In the inline table, click each of the added rows and then enter their names between quotes: Shirts, Hats, Pets, and Mugs.

Transforming the incoming data into XML structure

  1. Double-click tWriteXMLField to open its Basic settings view.

  2. Click the [...] button next to the Edit schema field to open the schema editor and then add a row by clicking the [+] button.

  3. Click the newly added row to the right view of the schema editor and enter the name of the output column where you want to write the XML content. It is xmlRecord in this example.

  4. Click OK to validate this output schema and close the schema editor.

    In the dialog box that pops up, click OK to propagate this schema to the following component.

  5. In the Basic settings view, click the [...] button next to Configure XML Tree to open the dialog box where you can create the XML structure.

  6. In the Link Target area, click rootTag and rename it to ProductFamily, which is the name of the business entity used in this scenario.

  7. In the Linker source area, drop family to ProductFamily in the Link target area.

    A dialog box pops up, asking you to select one operation.

    Select Create as sub-element of target node to create a sub-element of the ProductFamily node. Then, the family element appears under the ProductFamily node.

    Right-click the Name node and select from the contextual menu Set As Loop Element.

  8. In the Link target area, click the family node and rename it to Name, which is one of the attributes of the ProductFamily business entity.

    Click OK to validate the XML structure you defined.

Writing the incoming data into a business entity

  1. Double-click tMDMBulkLoad to open its Basic settings view.

  2. Select xmlRecord from the XML Field drop-down list.

  3. In the URL field, enter the bulk loader URL between quotes. For example, http://localhost:8180/talendmdm/services/bulkload.

  4. In the Username and Password fields, enter your login and password to connect to the MDM server.

  5. In the Data Model and the Data Container fields, enter the names corresponding to the data model and the data container you need to use. Both are Product for this scenario.

    In the Entity field, enter the name of the business entity into which you want to load the records. In this example, enter ProductFamily.

  6. Select the Generate ID check box in order to generate ID values for the records to be loaded.

  7. In the Commit size field, type in the batch size to be written into the MDM hub in bulk mode.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or clicking Run on the Run tab.

    Log into your Talend MDM Web User Interface to check the newly added records for the ProductFamily business entity.