Scenario 2: Launching a lookup flow to join complementary data - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Based on the previous scenario, this scenario shows how to use a lookup flow to join data of interest in the XML file USState.xml to the main flow. Another tFileInputXML component is added to the Job to load data from the lookup file USState.xml to the processing component tXMLMap.

The content of the XML file USState.xml is as follows:

<?xml version="1.0" encoding="ISO-8859-15"?>
<USStates>
  <States>
    <idState>1</idState>
    <LabelState>Alabama</LabelState>
  </States> 
  <States>
    <idState>2</idState>
    <LabelState>Connecticut</LabelState>
  </States>
  <States>
    <idState>3</idState>
    <LabelState>Ohio</LabelState>
  </States>  
  <States>
    <idState>4</idState>
    <LabelState>Wyoming</LabelState>
  </States>
    <States>
    <idState>5</idState>
    <LabelState>Hawaii</LabelState>
  </States>
</USStates>

Adding and linking another input component

  1. In your Studio, open the Job used in the previous scenario to display it in the design workspace.

  2. Add another tFileInputXML component to the Job by typing its name in the design workspace or dropping it from the Palette. Label the component USStates to better identify its function.

  3. Link the tFileInputXML component labelled USStates to the tXMLMap component using a Row > Main connection, and the connection is automatically changed to a lookup flow.

Configuring the input flow for lookup

  1. Double-click the tFileInputXML component labelled USStates to open its Basic settings view.

  2. Click the [...] button next to Edit schema and in the [Schema] dialog box define the schema by adding one column USState of Document type.

  3. Click OK to validate the changes and close the dialog box. One row is added automatically to the Mapping table.

  4. In the File name/Stream field, browse to or type in between double quotation marks the path to the XML source file that holds the complementary data. In this scenario, it is E:/USState.xml.

  5. In the Loop XPath query field, type in an XPath expression between double quotation marks to specify the node on which the loop is based. In this scenario, it is /, which means to perform look query from the root.

  6. In the XPath query column of the Mapping table, type in the fields to be queried between double quotation marks. In this scenario, it is ., which means all fields under the current node (root) will be extracted.

  7. In the Get Nodes column of the Mapping table, select the check box. This retrieves the XML structure for the Document type data.

Configuring tXMLMap for transformation

  1. Double-click the tXMLMap component to open its Map Editor.

    Note that the input area is already filled with the defined input tables and the top table is the main input table.

  2. In the row2 input table, right-click the USState node and from the contextual menu select Import From File. In the pop-up dialog box, browse to the XML source file to import therefrom the XML tree structure used by the data to be received by tXMLMap. In this scenario, the XML source file is USState.xml, which is the input data to tFileInputXML labelled USStates.

  3. In the imported XML tree, right-click the States node and from the contextual menu select As loop element to set it as the loop element.

  4. In the row1 main input table, click the idState node and drop it to the Exp.key column in the row of the idState node in the row2 lookup input table. This creates a join between the two input tables on the idState data, among which the idState node from the main flow provides the lookup key.

  5. In the row2 lookup input table, click the LabelState node and drop it on the customer node in the Customer output table. A dialog box pops up.

  6. In the pop-up dialog box, select Create as sub-element of target node and click OK. A new LabelState sub-element is added to the output XML tree and mapped with the LabelState node in the lookup input table.

  7. Click OK to validate the mappings and close the Map Editor.

Saving and executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to run the Job.

    As shown above, the state names from the lookup file with the state IDs matching those in the main input file are added to the data flow and the combined information is displayed on the console.

A step-by-step tutorial related to this Join topic is available on the Talend Technical Community Site. For further information, see http://talendforge.org/tutorials/tutorial.php?language=english&idTuto=101.