tXMLMap - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tXMLMap is an advanced component fine-tuned for transforming and routing XML data flow (data of the Document type), especially when processing numerous XML data sources, with or without flat data to be joined.

Purpose

tXMLMap transforms and routes data from single or multiple sources to single or multiple destinations.

tXMLMap properties

Component family

Processing/XML

 

Basic settings

Map Editor

It allows you to define the tXMLMap routing and transformation properties.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Possible uses are from a simple reorganization of fields to the most complex jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and so on.

When needs be, you can define sophisticated outputting strategy for the output XML flows using group element, aggregate element, empty element and many other features such as All in one. For further information about these features, see Talend Studio User Guide.

It is used as an intermediate component and fits perfectly the process requiring many XML data sources, such as, the ESB request-response processes.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

The limitations to be kept in mind are:

- The use of this component supposes minimum Java and XML knowledge in order to fully exploit its functionalities.

- This component is a junction step, and for this reason it cannot be a start nor an end component in the Job.

- At least one loop element is required for each XML data flow involved.

The following sections present several generic use cases about how to use the tXMLMap component, while if you need some specific examples using this component along with the ESB components to build data services, see the scenarios for the ESB components:

If you need further information about the principles of mapping multiple input and output flows, see Talend Studio User Guide.

Scenario 1: Mapping and transforming XML data

The following scenario creates a three-component Job that maps and transforms data from an XML source file Customer.xml, and generates an XML output flow which could be reused for various purposes, such as for an ESB request, in the future based on the XML tree structure of the file Customer_State.xml.

These three components are:

  • tFileInputXML: provides the input data to tXMLMap.

  • tXMLMap: maps and transforms the received XML data flows into one single XML data flow.

  • tLogRow: displays the output data.

The content of the XML file Customer.xml is as follows:

<?xml version="1.0" encoding="ISO-8859-15"?>
<Customers>
	<Customer RegisterTime="2001-01-17 06:26:40.000">
		<Name>
			<id>1</id>
			<CustomerName>Griffith Paving and Sealcoatin</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>talend@apres91</CustomerAddress>
			<idState>2</idState>
		</Address>
		<Revenue>
			<Sum1>67852</Sum1>
		</Revenue>
	</Customer>
	<Customer RegisterTime="2002-06-07 09:40:00.000">
		<Name>
			<id>2</id>
			<CustomerName>Bill's Dive Shop</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>511 Maple Ave. Apt. 1B</CustomerAddress>
			<idState>3</idState>
		</Address>
		<Revenue>
			<Sum1>88792</Sum1>
		</Revenue>
	</Customer>
	<Customer RegisterTime="1987-02-23 17:33:20.000">
		<Name>
			<id>3</id>
			<CustomerName>Glenn Oaks Office Supplies</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>1859 Green Bay Rd.</CustomerAddress>
			<idState>2</idState>
		</Address>
		<Revenue>
			<Sum1>1225.</Sum1>
		</Revenue>
	</Customer>
	<Customer RegisterTime="1992-04-28 23:26:40.000">
		<Name>
			<id>4</id>
			<CustomerName>DBN Bank</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>456 Grossman Ln.</CustomerAddress>
			<idState>3</idState>
		</Address>
		<Revenue>
			<Sum1>64493</Sum1>
		</Revenue>
	</Customer>
</Customers>

The content of the XML file Customer_State.xml is as follows:

<?xml version="1.0" encoding="ISO-8859-15"?>
<customers>
	<customer id="1">
		<CustomerName>Griffith Paving and Sealcoatin</CustomerName>
		<CustomerAddress>talend@apres91</CustomerAddress>
		<idState>2</idState>
	</customer>
	<customer id="2">
		<CustomerName>Bill's Dive Shop</CustomerName>
		<CustomerAddress>511 Maple Ave.  Apt. 1B</CustomerAddress>
		<idState>3</idState>
	</customer>
</customers>

Adding and linking the components

  1. Create a new Job and add a tFileInputXML component, a tXMLMap component, a tLogRow component by typing their names in the design workspace or dropping them from the Palette.

  2. Label the tFileInputXML component Customers to better identify its function.

    Note

    A component used in the workspace can be labelled the way you need. For further information about how to label a component, see Talend Studio User Guide.

  3. Link the tFileInputXML component labelled Customers to the tXMLMap component using a Row > Main connection.

  4. Link the tXMLMap component to the tLogRow component using a Row > *New Output* (Main) connection. In the pop-up dialog box, enter the name of the output connection, Customer in this scenario.

Configuring the input flow

  1. Double-click the tFileInputXML component labelled Customers to open its Basic settings view.

  2. Click the [...] button next to Edit schema and in the [Schema] dialog box define the schema by adding one column Customer of Document type.

    Note that the Document data type is essential for making full use of tXMLMap. For further information about this data type, see Talend Studio User Guide.

  3. Click OK to validate the changes and close the dialog box. One row is added automatically to the Mapping table.

  4. In the File name/Stream field, browse to or type in between double quotation marks the path to the XML source file that provides the customer data. In this scenario, it is E:/Customer.xml.

  5. In the Loop XPath query field, type in an XPath expression between double quotation marks to specify the node on which the loop is based. In this scenario, it is /, which means to perform look query from the root.

  6. In the XPath query column of the Mapping table, type in the fields to be queried between double quotation marks. In this scenario, it is ., which means all fields under the current node (root) will be extracted.

  7. In the Get Nodes column of the Mapping table, select the check box.

    Note

    In order to build the Document type data flow, it is necessary to get the nodes from this component.

Configuring tXMLMap for transformation

  1. Double-click the tXMLMap component to open its Map Editor.

    Note that the input area is already filled with the default basic XML structure and the top table is the main input table.

  2. In the row1 input table, right-click the Customer node and from the contextual menu select Import From File. In the pop-up dialog box, browse to the XML source file to import therefrom the XML tree structure used by the data to be received by tXMLMap. In this scenario, the XML source file is Customer.xml, which is the input data to the tFileInputXML component labelled Customers.

    Note

    You can also import an XML tree from an XSD file. When importing either an input or an output XML tree structure from an XSD file, you can choose an element as the root of your XML tree. For more information on importing an XML tree structure from an XSD file, see Talend Studio User Guide.

  3. In the imported XML tree, right-click the Customer node and from the contextual menu select As loop element to set it as the loop element.

  4. On the lower part of this map editor, click the Schema editor tab to display the corresponding view. Then on the right side of this view, add one column Customer_States of Document type to the Customer schema table. The corresponding XML root is added automatically to the Customer output table on the top right side which represents the output flow.

  5. In the Customer output table, right-click the Customer_States node and from the contextual menu select Import From File. In the pop-up dialog box, browse to the XML file from which the XML tree structure is imported. In this scenario, it is Customer_State.xml.

  6. Right-click the customer node and from the contextual menu select As loop element to set it as the loop element.

  7. In the row1 input table, click the id node and drop it to the Expression column in the row of the @id node in the Customer output table.

    Do the same to map CustomerName to CustomerName, CustomerAddress to CustomerAddress, and idState to idState from the input table to the output table.

    Note

    In some circumstances, you may have to keep empty elements in your output XML tree. If so, you can use tXMLMap to manage them. For further information about how to manage empty elements using tXMLMap, see Talend Studio User Guide.

  8. On the top of the Customer output table, click the wrench icon and set the value of the All in one property to true to generate a single XML flow. For further information about the All in one feature, see Talend Studio User Guide.