tMDMRestInput - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tMDMRestInput reads data from the MDM Hub through the REST API.

Purpose

This component reads data from the MDM Hub for further processing.

tMDMRestInput properties

Component family

Talend MDM

 

Basic settings

Schema and Edit Schema

A schema is a row description, it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

 

 

Built-In: The schema will be created and stored for this component only. See Talend Studio User Guide for more information.

 

 

Repository: The schema already exists and is stored in the repository. You can reuse it in various projects and jobs. See Talend Studio User Guide for more information.

 

Use an existing connection

Select this check box if you want to use a configured tMDMConnection component.

 

URL

Enter the URL to access the MDM server through the REST API.

 

Username and Password

Enter the user authentication data for the MDM server.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

 

Data Container

Enter the name of the data container that holds the data records you want to read.

Type

Select Master or Staging to specify the type of database on which the reading action should be performed.

 

Retrieve raw data

Select this check box to retrieve all elements of an entity into a single field.

  • XML field: Select the name of the field in which you want to write the retrieved data.

  • Accept Type: Select the type of content (XML or JSON) you want to get.

 

Query Text

Enter the query text you want to include in REST API calls to retrieve the data records of interest.

Apart from the default sample query, the query text can be:

  • a globalMap variable, for example, ((String)globalMap.get("row1.query"))

  • a pre-escaped context variable, for example, context.lpcMDMRestQuery

  • a query including the globalMap variable and/or the context variable, for example, "{'select':{'from':['"+context.myEntity +"'],'fields':[{'field':'"+ (String)globalMap.get("field") +"'}] }}"

To achieve better performance, use the query text to select specific fields rather than use the Retrieve raw data option without defining any field in the query text.

Once you have entered the query text, make sure to set the schema correctly based on the query text. For more information, see How to set the schema correctly based on the query text when using tMDMRestInput.

Warning

You need to select the Retrieve Raw Data check box only if no field is defined in the query text.

 

Die on error

Select this check box to skip the row in error and complete the process for error-free rows. If needed, you can retrieve the rows in error via a Row > Rejects link.

Advanced settings

Batch Size

Number of lines in each processed batch.

When the number of records for the current query is greater than the batch size, the records should be paginated and retrieved batch by batch.

 

tStatCatcher Statistics

Select this check box to gather the processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

NB_LINE: the number of rows processed. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

tMDMRestInput can be used along with tMDMConnection, tMDMCommit, and tMDMRollback.

tMDMRestInput needs an output link.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

How to set the schema correctly based on the query text when using tMDMRestInput

When using the component tMDMRestInput, you can use the query language to narrow down the data records to be retrieved. For more information, see Talend Help Center (https://help.talend.com).

Based on the query, you need to set the schema correctly for the retrieved data.

Pay attention to the following cases:

  • When a query only counts how many results are returned by the query, you need to define one and only one column count in the schema.

  • When a query gets a metadata field, you need to define a column the same name as the metadata field in the schema.

  • When a query gets one or more fields, you need to define one or more columns whose names are the same as the returned fields in the schema.

  • When a query uses an alias, you need to define a column the same name as the alias field in the schema.

For example, if a query text gets the following fields, you need to define columns in the schema correspondingly: id, price, timestamp, taskid and productname.

 "{
  'select': {
        'from': ['Product'],
        'fields': [
               {'field': 'Product/Id'},
               {'field': 'Product/Price'},
               {'metadata': 'timestamp'},
               {'metadata': 'task_id'},
               {'alias' : [{'name' : 'ProductName'}, {'field': 'Product/Name'}]}
                  ]
             }
}"

Scenario: Reading data from an MDM hub through the REST API

This scenario describes a two-component Job that reads data from a business entity in the MDM server through the REST API.

In this example, we assume that you have already imported the MDM demo project. For further information about how to import a demo project, see Talend Studio User Guide.

This Job fetches data records that pertain to the Product entity of the Product data container in the MDM demo project.

Dropping and linking the components

  1. From the Palette, drop tMDMRestInput and tLogRow onto the design workspace.

  2. Connect the components using a Row > Main link.

Configuring the components

  1. Double-click tMDMRestInput to view its Basic settings in the Component tab.

  2. In the Schema list, select Built-In and then click the [...] button next to Edit schema to open a dialog box in which you can define the structure of the retrieved data.

    In this example, we will extract the four elements of the product information defined in the Product data model into the four fields: id, name, description and price.

  3. Click the [+] button and add four columns of the type String.

    The data records retrieved from the MDM server need to be mapped into a correct schema. For more information, see How to set the schema correctly based on the query text when using tMDMRestInput.

  4. Click OK to validate your changes.

    The [Propagate] dialog box pops up. Click Yes to propagate your changes.

  5. In the URL field, enter the URL to access the MDM server through the REST API. In this example, leave it as default.

  6. In the Username and Password fields, enter the credentials to access the MDM server.

  7. In the Data Container field, enter the name of the container which holds the data you want to retrieve, Product in this example.

    Then, select Master from the Type list.

  8. In the Query Text area, enter the query you want to include in the REST API calls for retrieving the data records of interest. The entire query text is enclosed with double quotes.

    In this example, enter the following to retrieve the product record(s) with a price larger than 500:

    "{'select':{
           'from':['Product'],
           'fields':[{'field':'Product/Id'},
    		   {'field':'Product/Name'},
    		   {'field':'Product/Description'},
    		   {'field':'Product/Price'}					
    		],
            'where': {
            'gt': [
                     {'field':'Product/Price'},
                     {'value':'500'}        
    	        ]
                      }
               }
    }"
  9. Double-click tLogRow and then select Table from the Mode list.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or clicking Run on the Run tab.

    All data records that pertain to the Product entity in the Product data container with a price larger than 500 are retrieved and displayed on the console.