tMDMInput - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tMDMInput reads data in the MDM Hub.

Purpose

This component reads data in an MDM Hub and thus makes it possible to process this data.

tMDMInput properties

Component family

Talend MDM

 

Basic Settings

Property Type

Either Built in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: No property data stored centrally

 

 

Repository: Select the repository file where properties are stored. The fields that follow are completed automatically using the fetched data

 

Schema and Edit Schema

A schema is a row description, it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

 

 

Built-in: The schema will be created and stored for this component only. Related Topic: see Talend Studio User Guide.

 

 

Repository: The schema already exists and is stored in the repository. You can reuse it in various projects and jobs. Related Topic: see Talend Studio User Guide.

 Use an existing connectionSelect this check box if you want to use a configured tMDMConnection component.
 

MDM version

By default, Server 6.0 is selected. Although it is recommended to migrate existing jobs for this new version, the Server 5.6 option is available to ease the process of the migration of your Jobs so as to keep them working without modification with a 6.0 server. To do so, an option on the server must be enabled to accept and translate requests from such Jobs.

 

URL

Type in the URL to access the MDM server.

 

Username and Password

Type in user authentication data for the MDM server.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

 

Entity

Type in the name of the business entity that holds the data you want to read.

 

Data Container

Type in the name of the data container that holds the data you want to read.

Type

Select Master or Staging to specify the database on which the action should be performed.

 

Use multiple conditions

Select this check box to filter the data using certain conditions.

Xpath: Enter between quotes the path and the XML node to which you want to apply the condition.

Function: Select the condition to be used from the list.

Before using the conditions, bear in mind the following:

  • Depending on the type of field the Xpath points to, only certain operators apply. For example, if the field is a boolean, only the Equal or Not Equal operators are appropriate.

  • The operator Not Equal does not support multi-occurrence fields or complex type fields.

The following operators are available:

  • Contains: Returns a result which contains the word or words entered. Note that full text search does not support special characters, for example, @, #, $.

  • Contains the sentence: Returns one or more results which contain the sentence entered.

  • Joins With: This operator is reserved for future use.

  • Starts With: Returns a result which begins with the string entered.

  • Equal: Returns a result which matches the value entered.

  • Not Equal: Returns a result of any value other than the null value and the value entered.

  • is greater than: Returns a result which is greater than the numerical value entered. Applies to number fields only.

  • is greater or equal: Returns a result which is greater than or equal to the numerical value entered. Applies to number fields only.

  • is lower than: Returns a result which is less than the numerical value entered. Applies to number fields only.

  • is lower or equal: Returns a result which is less than or equal to the numerical value entered. Applies to number fields only.

  • whole content contains: Performs a plain text search using the specified Xpath field in the selected data container. If you enter an empty string "" in the Xpath field and select whole content contains from the Function list, searches will be performed in all the fields of all entities in the selected data container.

  • is empty or null: Returns an empty field or a null value.

Value: Enter between inverted commas the value you want to use. Note that if the value contains XML special characters such as /, you must also enter the value in single quotes ("'ABC/XYZ'") or the value will be considered as an XPath.

Predicate: Select a predicate if you use more than one condition.

The following predicates are available:

  • Default: Interpreted as an and.

  • or: One of the conditions applies.

  • and: Both or all of the conditions apply.

The other predicates are reserved for future use and may be subject to unpredictable behavior.

If you clear this check box, you have the option of selecting particular IDs to be displayed in the ID value column of the IDS table.

Note

If you clear the Use multiple conditions check box, the Batch Size option in the Advanced Settings tab will no longer be available

 

Skip Rows

Enter the number of lines to be ignored.

 

Max Rows

Maximum number of rows to be processed. If Limit = 0, no row is read or processed.

 

Die on error

Select this check box to skip the row in error and complete the process for error-free rows. If needed, you can retrieve the rows in error via a Row > Rejects link.

Advanced settings

Batch Size

Number of lines in each processed batch.

Note

This option is not displayed if you have cleared the Use multiple conditions check box in the Basic settings view.

 

Loop XPath query

The XML structure node on which the loop is based.

 

Mapping

Column: reflects the schema as defined in the Edit schema editor.

XPath query: Type in the name of the fields to extract from the input XML structure.

Get Nodes: Select this check box to retrieve the Xml node together with the data.

 

tStatCatcher Statistics

Select this check box to gather the processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

NB_LINE: the number of rows processed. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Use this component as a start component. It needs an output flow.

Reading master data from an MDM hub

This scenario describes a two-component Job that fetches master data from an MDM server, and displays the data in the log console.

Prerequisites:

  • Make sure the MDM server is up and running.

  • You have imported the MDM demo project and loaded the sample data into the data container Product by running the Job MDM_LoadAll.

Creating a Job to read master data from MDM

  1. From the Palette, drop tMDMInput and tLogRow onto the design workspace.

  2. Link the two components together using a Row > Main connection.

Configuring basic settings of tMDMInput to read master data from MDM

  1. Double-click tMDMInput to open the Basic settings view.

  2. In the Property Type list, select Built-In.

  3. In the Schema list, select Built-In and click the [...] button next to Edit schema to open a dialog box and define the structure of the master data you want to read from the MDM server.

    In this example, three columns are defined to fetch three elements from the Product entity: Name, Price, and Colors.

  4. After you have defined the schema, click OK to close this dialog box, and then click Yes in the [Propagate] dialog box to propagate the schema changes to tLogRow.

  5. Enter the user name and password for accessing the MDM server.

  6. In the Entity field, enter Product between quotes.

  7. In the Data Container field, enter Product between quotes.

  8. Select Master from the Type list.

  9. Define the query conditions in the Operations area.

    In this example, we want to query the product data records whose names include Shirt.

    1. Click the [+] button to add a row.

    2. Enter Product/Name between quotes in the Xpath field.

      Note

      Apart from elements defined in entities, you can query metadata elements which are also known as built-in elements. To query metadata elements from records in master database, you need to follow the format of metadata:<timestamp|task_id> when defining the path expression to select the XML node to run the query on.

    3. Select Contains from the Function list.

    4. Enter Shirt between quotes in the Value field.

Configuring advanced settings of tMDMInput to read master data from MDM

  1. In the Component view, click the Advanced settings tab.

  2. In the Loop XPath query field, enter /Product between quotes on which the loop is based.

  3. In the XPath query column of the Mapping table, enter the name of the XML node from which you want to collect the master data, next to the corresponding output column name.

  4. Select the Get Nodes check box for the Colors row to retrieve the XML node together with its data.

Configuring the data display mode and executing the Job

  1. Double-click the tLogRow component to display its Basic settings view.

  2. In the Mode area, select Table (print values in cells of a table) for better readability of the result.

  3. Save the Job and press F6 to run it.

    The Product data records whose names include "Shirt" are displayed on the console with the values of three specified columns.