Scenario 2: Extracting erroneous XML data via a reject flow - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This Java scenario describes a three-component Job that reads an XML file and:

  1. first, returns correct XML data in an output XML file,

  2. and second, displays on the console erroneous XML data which type does not correspond to the defined one in the schema.

  1. Drop the following components from the Palette to the design workspace: tFileInputXML, tFileOutputXML and tLogRow.

    Right-click tFileInputXML and select Row > Main in the contextual menu and then click tFileOutputXML to connect the components together.

    Right-click tFileInputXML and select Row > Reject in the contextual menu and then click tLogRow to connect the components together using a reject link.

  2. Double-click tFileInputXML to display the Basic settings view and define the component properties.

  3. In the Property Type list, select Repository and click the three-dot button next to the field to display the [Repository Content] dialog box where you can select the metadata relative to the input file if you have already stored it in the File xml node under the Metadata folder of the Repository tree view. The fields that follow are automatically filled with the fetched data. If not, select Built-in and fill in the fields that follow manually.

    For more information about storing schema metadat in the Repository tree view, see Talend Studio User Guide.

  4. In the Schema Type list, select Repository and click the three-dot button to open the dialog box where you can select the schema that describe the structure of the input file if you have already stored it in the Repository tree view. If not, select Built-in and click the three-dot button next to Edit schema to open a dialog box where you can define the schema manually.

    The schema in this example consists of five columns: id, CustomerName, CustomerAddress, idState and id2.

  5. Click the three-dot button next to the Filename field and browse to the XML file you want to process.

  6. In the Loop XPath query, enter between inverted commas the path of the XML node on which to loop in order to retrieve data.

    In the Mapping table, Column is automatically populated with the defined schema.

    In the XPath query column, enter between inverted commas the node of the XML file that holds the data you want to extract from the corresponding column.

  7. In the Limit field, enter the number of lines to be processed, the first 10 lines in this example.

  8. Double-click tFileOutputXML to display its Basic settings view and define the component properties.

  9. Click the three-dot button next to the File Name field and browse to the output XML file you want to collect data in, customer_data.xml in this example.

    In the Row tag field, enter between inverted commas the name you want to give to the tag that will hold the recuperated data.

    Click Edit schema to display the schema dialog box and make sure that the schema matches that of the preceding component. If not, click Sync columns to retrieve the schema from the preceding component.

  10. Double-click tLogRow to display its Basic settings view and define the component properties.

    Click Edit schema to open the schema dialog box and make sure that the schema matches that of the preceding component. If not, click Sync columns to retrieve the schema of the preceding component.

    In the Mode area, select the Vertical option.

  11. Save your Job and press F6 to execute it.

The output file customer_data.xml holding the correct XML data is created in the defined path and erroneous XML data is displayed on the console of the Run view.