tXMLMap - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tXMLMap is an advanced component fine-tuned for transforming and routing XML data flow (data of the Document type), especially when processing numerous XML data sources, with or without flat data to be joined.

Purpose

tXMLMap transforms and routes data from single or multiple sources to single or multiple destinations.

tXMLMap properties

Component family

Processing/XML

 

Basic settings

Map Editor

It allows you to define the tXMLMap routing and transformation properties.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Possible uses are from a simple reorganization of fields to the most complex jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and so on.

When needs be, you can define sophisticated outputting strategy for the output XML flows using group element, aggregate element, empty element and many other features such as All in one. For further information about these features, see Talend Studio User Guide.

It is used as an intermediate component and fits perfectly the process requiring many XML data sources, such as, the ESB request-response processes.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

The limitations to be kept in mind are:

- The use of this component supposes minimum Java and XML knowledge in order to fully exploit its functionalities.

- This component is a junction step, and for this reason it cannot be a start nor an end component in the Job.

- At least one loop element is required for each XML data flow involved.

The following sections present several generic use cases about how to use the tXMLMap component, while if you need some specific examples using this component along with the ESB components to build data services, see the scenarios for the ESB components:

If you need further information about the principles of mapping multiple input and output flows, see Talend Studio User Guide.

Scenario 1: Mapping and transforming XML data

The following scenario creates a three-component Job that maps and transforms data from an XML source file Customer.xml, and generates an XML output flow which could be reused for various purposes, such as for an ESB request, in the future based on the XML tree structure of the file Customer_State.xml.

These three components are:

  • tFileInputXML: provides the input data to tXMLMap.

  • tXMLMap: maps and transforms the received XML data flows into one single XML data flow.

  • tLogRow: displays the output data.

The content of the XML file Customer.xml is as follows:

<?xml version="1.0" encoding="ISO-8859-15"?>
<Customers>
	<Customer RegisterTime="2001-01-17 06:26:40.000">
		<Name>
			<id>1</id>
			<CustomerName>Griffith Paving and Sealcoatin</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>talend@apres91</CustomerAddress>
			<idState>2</idState>
		</Address>
		<Revenue>
			<Sum1>67852</Sum1>
		</Revenue>
	</Customer>
	<Customer RegisterTime="2002-06-07 09:40:00.000">
		<Name>
			<id>2</id>
			<CustomerName>Bill's Dive Shop</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>511 Maple Ave. Apt. 1B</CustomerAddress>
			<idState>3</idState>
		</Address>
		<Revenue>
			<Sum1>88792</Sum1>
		</Revenue>
	</Customer>
	<Customer RegisterTime="1987-02-23 17:33:20.000">
		<Name>
			<id>3</id>
			<CustomerName>Glenn Oaks Office Supplies</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>1859 Green Bay Rd.</CustomerAddress>
			<idState>2</idState>
		</Address>
		<Revenue>
			<Sum1>1225.</Sum1>
		</Revenue>
	</Customer>
	<Customer RegisterTime="1992-04-28 23:26:40.000">
		<Name>
			<id>4</id>
			<CustomerName>DBN Bank</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>456 Grossman Ln.</CustomerAddress>
			<idState>3</idState>
		</Address>
		<Revenue>
			<Sum1>64493</Sum1>
		</Revenue>
	</Customer>
</Customers>

The content of the XML file Customer_State.xml is as follows:

<?xml version="1.0" encoding="ISO-8859-15"?>
<customers>
	<customer id="1">
		<CustomerName>Griffith Paving and Sealcoatin</CustomerName>
		<CustomerAddress>talend@apres91</CustomerAddress>
		<idState>2</idState>
	</customer>
	<customer id="2">
		<CustomerName>Bill's Dive Shop</CustomerName>
		<CustomerAddress>511 Maple Ave.  Apt. 1B</CustomerAddress>
		<idState>3</idState>
	</customer>
</customers>

Adding and linking the components

  1. Create a new Job and add a tFileInputXML component, a tXMLMap component, a tLogRow component by typing their names in the design workspace or dropping them from the Palette.

  2. Label the tFileInputXML component Customers to better identify its function.

    Note

    A component used in the workspace can be labelled the way you need. For further information about how to label a component, see Talend Studio User Guide.

  3. Link the tFileInputXML component labelled Customers to the tXMLMap component using a Row > Main connection.

  4. Link the tXMLMap component to the tLogRow component using a Row > *New Output* (Main) connection. In the pop-up dialog box, enter the name of the output connection, Customer in this scenario.

Configuring the input flow

  1. Double-click the tFileInputXML component labelled Customers to open its Basic settings view.

  2. Click the [...] button next to Edit schema and in the [Schema] dialog box define the schema by adding one column Customer of Document type.

    Note that the Document data type is essential for making full use of tXMLMap. For further information about this data type, see Talend Studio User Guide.

  3. Click OK to validate the changes and close the dialog box. One row is added automatically to the Mapping table.

  4. In the File name/Stream field, browse to or type in between double quotation marks the path to the XML source file that provides the customer data. In this scenario, it is E:/Customer.xml.

  5. In the Loop XPath query field, type in an XPath expression between double quotation marks to specify the node on which the loop is based. In this scenario, it is /, which means to perform look query from the root.

  6. In the XPath query column of the Mapping table, type in the fields to be queried between double quotation marks. In this scenario, it is ., which means all fields under the current node (root) will be extracted.

  7. In the Get Nodes column of the Mapping table, select the check box.

    Note

    In order to build the Document type data flow, it is necessary to get the nodes from this component.

Configuring tXMLMap for transformation

  1. Double-click the tXMLMap component to open its Map Editor.

    Note that the input area is already filled with the default basic XML structure and the top table is the main input table.

  2. In the row1 input table, right-click the Customer node and from the contextual menu select Import From File. In the pop-up dialog box, browse to the XML source file to import therefrom the XML tree structure used by the data to be received by tXMLMap. In this scenario, the XML source file is Customer.xml, which is the input data to the tFileInputXML component labelled Customers.

    Note

    You can also import an XML tree from an XSD file. When importing either an input or an output XML tree structure from an XSD file, you can choose an element as the root of your XML tree. For more information on importing an XML tree structure from an XSD file, see Talend Studio User Guide.

  3. In the imported XML tree, right-click the Customer node and from the contextual menu select As loop element to set it as the loop element.

  4. On the lower part of this map editor, click the Schema editor tab to display the corresponding view. Then on the right side of this view, add one column Customer_States of Document type to the Customer schema table. The corresponding XML root is added automatically to the Customer output table on the top right side which represents the output flow.

  5. In the Customer output table, right-click the Customer_States node and from the contextual menu select Import From File. In the pop-up dialog box, browse to the XML file from which the XML tree structure is imported. In this scenario, it is Customer_State.xml.

  6. Right-click the customer node and from the contextual menu select As loop element to set it as the loop element.

  7. In the row1 input table, click the id node and drop it to the Expression column in the row of the @id node in the Customer output table.

    Do the same to map CustomerName to CustomerName, CustomerAddress to CustomerAddress, and idState to idState from the input table to the output table.

    Note

    In some circumstances, you may have to keep empty elements in your output XML tree. If so, you can use tXMLMap to manage them. For further information about how to manage empty elements using tXMLMap, see Talend Studio User Guide.

  8. On the top of the Customer output table, click the wrench icon and set the value of the All in one property to true to generate a single XML flow. For further information about the All in one feature, see Talend Studio User Guide.

  9. Click OK to validate the changes and close the Map Editor.

    Note

    If you close the Map Editor without having set the required loop elements as described earlier in this scenario, the root element will be automatically set as the loop element.

Configuring tLogRow to display the customer information

  1. Double-click the tLogRow component to open its Basic settings view.

  2. Click the Sync columns button to retrieve the schema from its preceding component.

Saving and executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to execute the Job.

    As shown above, the transformed customer information is displayed on the console.

Scenario 2: Launching a lookup flow to join complementary data

Based on the previous scenario, this scenario shows how to use a lookup flow to join data of interest in the XML file USState.xml to the main flow. Another tFileInputXML component is added to the Job to load data from the lookup file USState.xml to the processing component tXMLMap.

The content of the XML file USState.xml is as follows:

<?xml version="1.0" encoding="ISO-8859-15"?>
<USStates>
  <States>
    <idState>1</idState>
    <LabelState>Alabama</LabelState>
  </States> 
  <States>
    <idState>2</idState>
    <LabelState>Connecticut</LabelState>
  </States>
  <States>
    <idState>3</idState>
    <LabelState>Ohio</LabelState>
  </States>  
  <States>
    <idState>4</idState>
    <LabelState>Wyoming</LabelState>
  </States>
    <States>
    <idState>5</idState>
    <LabelState>Hawaii</LabelState>
  </States>
</USStates>

Adding and linking another input component

  1. In your Studio, open the Job used in the previous scenario to display it in the design workspace.

  2. Add another tFileInputXML component to the Job by typing its name in the design workspace or dropping it from the Palette. Label the component USStates to better identify its function.

  3. Link the tFileInputXML component labelled USStates to the tXMLMap component using a Row > Main connection, and the connection is automatically changed to a lookup flow.

Configuring the input flow for lookup

  1. Double-click the tFileInputXML component labelled USStates to open its Basic settings view.

  2. Click the [...] button next to Edit schema and in the [Schema] dialog box define the schema by adding one column USState of Document type.

  3. Click OK to validate the changes and close the dialog box. One row is added automatically to the Mapping table.

  4. In the File name/Stream field, browse to or type in between double quotation marks the path to the XML source file that holds the complementary data. In this scenario, it is E:/USState.xml.

  5. In the Loop XPath query field, type in an XPath expression between double quotation marks to specify the node on which the loop is based. In this scenario, it is /, which means to perform look query from the root.

  6. In the XPath query column of the Mapping table, type in the fields to be queried between double quotation marks. In this scenario, it is ., which means all fields under the current node (root) will be extracted.

  7. In the Get Nodes column of the Mapping table, select the check box. This retrieves the XML structure for the Document type data.

Configuring tXMLMap for transformation

  1. Double-click the tXMLMap component to open its Map Editor.

    Note that the input area is already filled with the defined input tables and the top table is the main input table.

  2. In the row2 input table, right-click the USState node and from the contextual menu select Import From File. In the pop-up dialog box, browse to the XML source file to import therefrom the XML tree structure used by the data to be received by tXMLMap. In this scenario, the XML source file is USState.xml, which is the input data to tFileInputXML labelled USStates.

  3. In the imported XML tree, right-click the States node and from the contextual menu select As loop element to set it as the loop element.

  4. In the row1 main input table, click the idState node and drop it to the Exp.key column in the row of the idState node in the row2 lookup input table. This creates a join between the two input tables on the idState data, among which the idState node from the main flow provides the lookup key.

  5. In the row2 lookup input table, click the LabelState node and drop it on the customer node in the Customer output table. A dialog box pops up.

  6. In the pop-up dialog box, select Create as sub-element of target node and click OK. A new LabelState sub-element is added to the output XML tree and mapped with the LabelState node in the lookup input table.

  7. Click OK to validate the mappings and close the Map Editor.

Saving and executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to run the Job.

    As shown above, the state names from the lookup file with the state IDs matching those in the main input file are added to the data flow and the combined information is displayed on the console.

A step-by-step tutorial related to this Join topic is available on the Talend Technical Community Site. For further information, see http://talendforge.org/tutorials/tutorial.php?language=english&idTuto=101.

Scenario 3: Mapping data using a filter

Based on Scenario 2: Launching a lookup flow to join complementary data, this scenario presents how to apply filter condition(s) to select the data of interest using tXMLMap.

  1. In your Studio, open the Job used in the previous scenario to display it in the design workspace.

  2. Double-click the tXMLMap component to open its Map Editor.

  3. On the top of the Customer output table, click the button to open the filter area.

  4. Drop the idState node in the main input table to the filter area. The XPath [row1.Customer:/Customers/Customer/Address/idState] of the idState node is added automatically to this filter area.

    Enter == 2 after the XPath of the idState node, and the complete filter condition becomes [row1.Customer:/Customers/Customer/Address/idState] == 2. This means only the customer data with the state id of 2 will be passed to the output flow.

  5. Click OK to validate the changes and close the map editor.

  6. Press Ctrl + S to save the Job and then F6 to run the Job.

    As shown above, the customers Griffith Paving and Sealcoatin and Glenn Oaks Office Supplies, whose state id is 2 are displayed on the console.

Scenario 4: Catching the data rejected by lookup and filter

The data rejected by the lookup and filter conditions set in tXMLMap can be caught and outputted by this component itself.

Based on Scenario 3: Mapping data using a filter, this scenario presents how to catch the data rejected by the lookup and the filter set up in the previous scenarios. Another tLogRow component is added to the Job used in the previous scenario to display the rejected data.

Adding and linking another output component

  1. In your Studio, open the Job used in the previous scenario to display it in the design workspace.

  2. Add another tLogRow component to the Job by typing its name in the design workspace or dropping it from the Palette.

  3. Link the tXMLMap component to the second tLogRow using a Row > *New Output* (Main) connection. In the pop-up dialog box, enter the name of the output connection, Reject in this example.

Configuring tXMLMap for transformation

  1. Double-click the tXMLMap component to open its Map Editor. An empty Reject output table that carries the rejected data has been added to the output side to represent the output data flow carrying the rejected data.

  2. In the row1 main input table, click the id node and drop it on the Reject output table. A column id is added to the Reject schema table in the Schema editor on the lower part of the map editor.

  3. Do the same to drop CustomerName, CustomerAddress, and idState in the row1 main input table and LabelState in the row2 lookup input table on the Reject output table. Another four columns CustomerName, CustomerAddress, idState, and LabelState are added to the Reject schema table in the Schema editor.

    Note

    In this scenario, the Reject output flow uses the flat data type. However, you can create an XML tree view for this flow similar to the Customer output flow using the Document data type. For further information about how to use the Document type, see Scenario 1: Mapping and transforming XML data.

  4. On the top of the Reject output table, click the button to open the property setting area.

  5. Set the value of the Catch Output Reject property to true to catch the data rejected by the filter set up in the previous scenario for the Customer output flow.

  6. Set the value of the Catch Lookup Inner Join Reject property to true to catch the data rejected by the inner join operation.

  7. Click OK to validate the changes and close the map editor.

Configuring the output flow

  1. Double-click the second tLogRow component to open its Basic settings view.

  2. Click the Sync columns button to retrieve the schema from its preceding component.

  3. In the Mode area, select Table (print values in cells of a table) for better readability of the result.

Saving and executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to run the Job.

    The captured data rejected by the filter and the lookup reads as follows in the Run view:

    As shown above, the data whose idState value is 2 is selected by the filter set up in the previous scenario and displayed in the upper part, and the data whose idState value is not 2 is rejected and displayed in the lower part.

Scenario 5: Mapping data using a group element

Based on Scenario 2: Launching a lookup flow to join complementary data, this scenario presents how to set up an element as group element in the Map Editor of tXMLMap to group the output data. For more information about how to group the output data using tXMLMap, see Talend Studio User Guide.

The objective of this scenario is to group the customer id and the customer name information according to the states the customers come from. You need to reconstruct the XML tree view of the Customer output table by considering the following factors:

  • The elements tagging the customer id and the customer name information should be located under the loop element. Thus they are the sub-elements of the loop element.

  • The loop element and its sub-elements should be dependent directly on the group element.

  • The element tagging the state information used as the grouping condition should be dependent directly on the group element.

  • The group element cannot be the root element.

Based on this analysis, the XML structure of the output data should read as follows. The customers node is the root element, the customer node is set as the group element and the output data is grouped according to the LabelState element.

To put a group element into effect, the XML data to be processed should have been sorted, for example via your XML tools, around the element that will be used as the grouping condition. In this example, the customers possessing the same state id should be put together. The input data in the XML file Customer.xml should read as follows:

<?xml version="1.0" encoding="ISO-8859-15"?>
<Customers>
	<Customer RegisterTime="2001-01-17 06:26:40.000">
		<Name>
			<id>1</id>
			<CustomerName>Griffith Paving and Sealcoatin</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>talend@apres91</CustomerAddress>
			<idState>2</idState>
		</Address>
		<Revenue>
			<Sum1>67852</Sum1>
		</Revenue>
	</Customer>
	<Customer RegisterTime="1987-02-23 17:33:20.000">
		<Name>
			<id>3</id>
			<CustomerName>Glenn Oaks Office Supplies</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>1859 Green Bay Rd.</CustomerAddress>
			<idState>2</idState>
		</Address>
		<Revenue>
			<Sum1>1225.</Sum1>
		</Revenue>
	</Customer>
	<Customer RegisterTime="2002-06-07 09:40:00.000">
		<Name>
			<id>2</id>
			<CustomerName>Bill's Dive Shop</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>511 Maple Ave. Apt. 1B</CustomerAddress>
			<idState>3</idState>
		</Address>
		<Revenue>
			<Sum1>88792</Sum1>
		</Revenue>
	</Customer>
	<Customer RegisterTime="1992-04-28 23:26:40.000">
		<Name>
			<id>4</id>
			<CustomerName>DBN Bank</CustomerName>
		</Name>
		<Address>
			<CustomerAddress>456 Grossman Ln.</CustomerAddress>
			<idState>3</idState>
		</Address>
		<Revenue>
			<Sum1>64493</Sum1>
		</Revenue>
	</Customer>
</Customers>
  1. In your Studio, open the Job used in Scenario 2: Launching a lookup flow to join complementary data to display it in the design workspace, and double-click the tXMLMap component to open its Map Editor.

  2. In the XML tree view of the Customer output table, right-click the customer (loop) node and select Delete from the contextual menu. Thus all of the elements under the customers root node are removed, then you can reconstruct the XML tree view that can be used to group the output data of interest.

  3. Right-click the customers root node and select Create Sub-Element from the contextual menu. In the pop-up dialog box, enter the name of the new sub-element. In this example, it is customer.

    Click OK to validate the changes and close the dialog box. A customer node is added under the customers root node in the output table.

  4. In the row2 lookup input table, select the LabelState node and drop it onto the customer node in the output table. In the pop-up dialog box, select Create as sub-element of target node and click OK to close the dialog box. A LabelState node is added under the customer node in the output table.

  5. Right-click the customer node in the output table and select Create Sub-Element from the contextual menu. In the pop-up dialog box, enter the name of the new sub-element. In this example, it is Name.

    Click OK to validate the changes and close the dialog box. A Name node is added under the customer node in the output table.

  6. In the row1 main input table, select the id and CustomerName nodes and drop them onto the Name node in the output table. In the pop-up dialog box, select Create as sub-element of target node and click OK to close the dialog box. A id node and a CustomerName node are added under the Name node in the output table.

  7. In the output table, right-click the Name node and from the contextual menu select As loop element to set it as the loop element, then right-click the customer node and from the contextual menu select As group element to group the output data according to the LabelState element.