Scenario: Validating data flows against an XSD file - 6.1

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a Job that validates an XML column in an input file against a reference XSD file and outputs the log information for the invalid rows of the column into a delimited file. For the tXSDValidator use case that validates an XML file, see Scenario: Validating XML files.

  1. Drop a tFileInputDelimited component, a tXSDValidator component, and two FileOutputDelimited components from the Palette to the design workspace.

  2. Double-click the tFileInputDelimited to open its Component view and set its properties:

  3. Use the Built-In property type for this scenario.

    Browse to the input file, and define the number of rows to be skipped in the beginning of the file.

    Use a Built-In schema for this scenario. This means that it is available for this Job only.

    Click Edit schema and edit the schema according to the input file. In this scenario, the input file has only two columns: ID and ShipmentInfo. The ShipmentInfo column is an XML column and needs to be validated.

  4. On your design workspace, connect the tFileInputDelimited component to the tXSDValidator component using a Row > Main link.

  5. Double-click the tXSDValidator component, and set its properties:

  6. From the Mode dropdown list, select Flow Mode.

    Use a Built-In schema for this scenario. Click Sync columns to retrieve the schema from the preceding component. To view or modify the schema, click the three-dot button next to Edit schema.

    Add a line in the Allocate table by clicking the plus button. The name of the first column of the input file automatically appears in the Input Column field. Click in the field and select the column you want to validate.

    In the XSD File field, fill in the path to your reference XSD file.

  7. On your design workspace, connect the tXSDValidator component to one tFileOutputDelimited component using a Row > Main link to output the information about valid XML rows.

  8. Connect the tXSDValidator component to the other tFileOutputDelimited component using a Row > Rejects link to output the information about invalid XML rows.

  9. Double-click each of the two tFileOutputDelimited components and configure the component properties.

    In the File Name field, enter or, if you want to use an existing output file, browse to the output file path.

  10. Select Built-In from the Schema list and click Sync columns to retrieve the schema from the preceding component.

  11. Save your Job and press F6 to run it.

The output files contain the validation information about the valid and invalid XML rows of the specified column respectively.