Scenario 7: Restructuring products data using multiple loop elements - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The following scenario creates a four-component Job that restructures the products data from an XML source file ProductsIn.xml using multiple loop elements.

These four components are:

  • tFileInputXML: reads the source products data and passes it to the tXMLMap component.

  • tXMLMap: transforms the input flow to the expected structure streamlined.

  • tLogRow: presents the execution result on the console.

  • tFileOutputXML: writes the output flow into an XML file.

The content of the source XML file ProductsIn.xml is as follows:

<?xml version="1.0" encoding="ISO-8859-15"?>
<products category="1" name="laptop">

	<!-- Summary -->
	<summary>
		<company>DELL, HP</company>
		<sales unit="Dollars">12345678910.12345</sales>
		<model>business</model>
	</summary>

	<!-- Loop1 manufacture -->
	<manufacture id="manu_1" date="2012-10-30">
		<name>DELL</name>
	</manufacture>
	<manufacture id="manu_2" date="2012-10-28">
		<name>HP</name>
	</manufacture>

	<!-- Loop2 types -->
	<types model="business1">
		<type>DELL123</type>
		<manufacture_id>manu_1</manufacture_id>
	</types>
	<types model="business2">
		<type>HP123</type>
		<manufacture_id>manu_2</manufacture_id>
	</types>

	<!-- Loop3 sale -->
	<sales>
		<sale unit="Dollars" type="DELL123">
			<quater>1</quater>
			<income>12345</income>
		</sale>
		<sale unit="Dollars" type="HP123">
			<quater>1</quater>
			<income>12345.123</income>
		</sale>
	</sales>
</products>

The objective of this scenario is to restructure the products data to streamline the presentation of the products information to serve the manufacturing operations. The expected output data is as follows. The root element is changed to manufacturers, the sales information is consolidated into the sale element, and the manufacturer element is reduced to one single level.

<?xml version="1.0" encoding="ISO-8859-15"?>
<manufacturers category="1" name="laptop">
  <sales unit="Dollars">
    <sale sales_type="DELL123">12345.0</sale>
    <sale sales_type="HP123">12345.123</sale>
  </sales>
  <manufacturer id="manu_1" date="03-04-0036" name="DELL"/>
  <manufacturer id="manu_2" date="04-04-0034" name="HP"/>
  <types>
    <type>DELL123</type>
    <manufacturer_id>manu_1</manufacturer_id>
  </types>
  <types>
    <type>DELL123</type>
    <manufacturer_id>manu_2</manufacturer_id>
  </types>
  <types>
    <type>HP123</type>
    <manufacturer_id>manu_1</manufacturer_id>
  </types>
  <types>
    <type>HP123</type>
    <manufacturer_id>manu_2</manufacturer_id>
  </types>
</manufacturers>

Adding and linking the components

  1. Create a new Job and add a tFileInputXML component, a tXMLMap component, a tLogRow component, and a tFileOutputXML component by typing their names in the design workspace or dropping them from the Palette.

  2. Link the tFileInputXML component to the tXMLMap component using a Row > Main connection.

  3. Link the tXMLMap component to the tLogRow component using a Row > *New Output* (Main) connection. In the pop-up dialog box, enter the name of the output connection, outDoc in this example.

  4. Link the tLogRow component to the tFileOutputXML component using a Row > Main connection.

Configuring the input flow

  1. Double-click the tFileInputXML component to open its Basic settings view.

  2. Click the [...] button next to Edit schema and in the [Schema] dialog box define the schema by adding one column doc of Document type.

  3. Click OK to validate the changes and close the dialog box. One row is added automatically to the Mapping table.

  4. In the File name/Stream field, browse to or type in between double quotation marks the path to the XML source file that provides the products data. In this scenario, it is E:/ProductsIn.xml.

  5. In the Loop XPath query field, type in an XPath expression between double quotation marks to specify the node on which the loop is based. In this scenario, it is /, which means to perform look query from the root.

  6. In the XPath query column of the Mapping table, type in the fields to be queried between double quotation marks. In this scenario, it is ., which means all fields under the current node (root) will be extracted.

  7. In the Get Nodes column of the Mapping table, select the check box.

Configuring tXMLMap with multiple loops

  1. Double-click the tXMLMap component to open its Map Editor.

    Note that the input area is already filled with the default basic XML structure and the top table is the main input table.

  2. In the row1 input table, right-click the doc node and from the contextual menu select Import From File. In the pop-up dialog box, browse to the XML source file to import therefrom the XML structure used by the data to be received by tXMLMap. In this scenario, the XML source file is ProductsIn.xml, which contains the input data to tFileInputXML.

  3. In the imported XML tree, right-click the manufacturer node and from the contextual menu select As loop element to set it as the loop element. Then do the same to set the types node and the sale node as loop elements respectively.

  4. On the lower part of the map editor, click the Schema editor tab to display the corresponding view. Then on the right side of this view, add one column outDoc of Document type to the schema table. The corresponding XML root is added automatically to the output table on the top right side which represents the output flow.

  5. In the outDoc output table, import the XML data structure to be used from the XML file that contains the expected output data and provides the expected XML structure.

    Right-click the sale node in the output table and select As loop element from the contextual menu. Then do the same to set the manufacturer node and the types node as loop elements respectively.

  6. In the row1 input table, click the @category node and drop it to the Expression field of the @category node in the outDoc output table.

    Do the same to map other nodes from the input table to the output table:

    • the @name node to the @name node,

    • the @unit node under the summary node to the @unit node,

    • the @id node to the @id node and to the manufacturer_id node respectively,

    • the @date node to the @date node,

    • the name node to the @name node,

    • the type node to the type node,

    • the @type node to the @sales_type node, and

    • the income node to the sale (loop) node.

  7. On the top of the outDoc output table, click the wrench icon and set the value of the All in one property to true to generate a single XML flow. For further information about the All in one feature, see Talend Studio User Guide.

  8. Click the [...] button next to the manufacturer loop element and in the pop-up [Configure source loops] dialog box click the [+] button to add one source loop manufacturer. Do the same to add one source loop sale for the sale loop element.

  9. Click the [...] button next to the types loop element and in the pop-up [Configure source loops] dialog box add two source loops types and manufacturer. Make sure the sequence number of the types source loop is 0 so that the relative part of the output flow will be sorted based on the values of the type element.

    Note

    When a loop element receives mappings from more than one loop element of the input flow, it allows you to set the sequence of the input loops. For example, the types loop element of the output flow in this scenario is mapped with the @id node which belongs to the manufacturer loop element and the type node which belongs to the types loop element of the input flow. The output flow will be sorted according to the primary types loop.

  10. Click OK to validate the mappings and close the Map Editor.

Configuring the output flow

  1. Double-click the tLogRow component to open its Basic settings view.

  2. Click the Sync columns button to retrieve the schema from its preceding component and accept the propagation prompted by the pop-up dialog box.

  3. Double-click the tFileOutputXML component to open its Basic settings view.

  4. In the File Name field, browse to or enter the path to the file in which the output data will be written. In this scenario, it is E:/ProductsOut.xml.

  5. Select the Incoming record is a document check box.

Saving and executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to execute the Job.

    As shown above, the input products data is restructured as expected and the output data is displayed on the console and written into the XML file ProductsOut.xml.