tXSDValidator - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

tXSDValidator Properties

Component family

XML

 

Function

Validates an input XML file or an input XML flow against an XSD file and sends the validation log to the defined output.

Purpose

Helps at controlling data and structure quality of the file or flow to be processed

Basic settings

Mode

From this dropdown list, select:

- File, to validate an input file

- Flow, to validate an input flow

 

Schema and Edit Schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component.

The schema of this component is read-only. It contains standard information regarding the file validation.

Note

File mode only

XSD file

Filepath to the reference XSD file. HTTP URL also supported, e.g. http://localhost:8080/book.xsd.

Note

File mode only

XML file

Filepath to the XML file to be validated.

Note

File mode only

If XML is valid, display If XML is invalid, display

Type in a message to be displayed in the Run console based on the result of the comparison.

Note

File mode only

Print to console

Select this check box to display the validation message.

Note

Flow mode only

Allocate

Specify the column or columns to be validated and the path to the reference XSD file.

Advanced settings

Encoding

Enter the encoding type between quotes.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

DIFFERENCE: the result of the validation. This is a Flow variable and it returns a string.

VALID: the validation result. This is a Flow variable and it returns a boolean.

XSD_ERROR_MESSAGE: the xsd error message generated by the component. This is a Flow variable and it returns a string.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

When used in File mode, this component can be used as standalone component but it is usually linked to an output component to gather the log data.

Limitation

n/a

Scenario: Validating data flows against an XSD file

This scenario describes a Job that validates an XML column in an input file against a reference XSD file and outputs the log information for the invalid rows of the column into a delimited file. For the tXSDValidator use case that validates an XML file, see Scenario: Validating XML files.

  1. Drop a tFileInputDelimited component, a tXSDValidator component, and two FileOutputDelimited components from the Palette to the design workspace.

  2. Double-click the tFileInputDelimited to open its Component view and set its properties:

  3. Use the Built-In property type for this scenario.

    Browse to the input file, and define the number of rows to be skipped in the beginning of the file.

    Use a Built-In schema for this scenario. This means that it is available for this Job only.

    Click Edit schema and edit the schema according to the input file. In this scenario, the input file has only two columns: ID and ShipmentInfo. The ShipmentInfo column is an XML column and needs to be validated.

  4. On your design workspace, connect the tFileInputDelimited component to the tXSDValidator component using a Row > Main link.

  5. Double-click the tXSDValidator component, and set its properties:

  6. From the Mode dropdown list, select Flow Mode.

    Use a Built-In schema for this scenario. Click Sync columns to retrieve the schema from the preceding component. To view or modify the schema, click the three-dot button next to Edit schema.

    Add a line in the Allocate table by clicking the plus button. The name of the first column of the input file automatically appears in the Input Column field. Click in the field and select the column you want to validate.

    In the XSD File field, fill in the path to your reference XSD file.

  7. On your design workspace, connect the tXSDValidator component to one tFileOutputDelimited component using a Row > Main link to output the information about valid XML rows.

  8. Connect the tXSDValidator component to the other tFileOutputDelimited component using a Row > Rejects link to output the information about invalid XML rows.

  9. Double-click each of the two tFileOutputDelimited components and configure the component properties.

    In the File Name field, enter or, if you want to use an existing output file, browse to the output file path.

  10. Select Built-In from the Schema list and click Sync columns to retrieve the schema from the preceding component.

  11. Save your Job and press F6 to run it.

The output files contain the validation information about the valid and invalid XML rows of the specified column respectively.