tDTDValidator - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

Validates the XML input file against a DTD file and sends the validation log to the defined output.

Purpose

Helps at controlling data and structure quality of the file to be processed

tDTDValidator Properties

Component family

XML

 

Basic settings

Schema and Edit Schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component.

The schema of this component is read-only. It contains standard information regarding the file validation.

 

DTD file

Filepath to the reference DTD file.

 

XML file

Filepath to the XML file to be validated.

 

If XML is valid, display If XML is invalid, display

Type in a message to be displayed in the Run console based on the result of the comparison.

 

Print to console

Select this check box to display the validation message.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the processing metadata at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

DIFFERENCE: the result of the validation. This is a Flow variable and it returns a string.

VALID: the validation result. This is a Flow variable and it returns a boolean.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component can be used as standalone component but it is usually linked to an output component to gather the log data.

Limitation

n/a

Scenario: Validating XML files

This scenario describes a Job that validates the specified type of files from a folder, displays the validation result on the Run tab console, and outputs the log information for the invalid files into a delimited file.

  1. Drop the following components from the Palette to the design workspace: tFileList, tDTDValidator, tMap, tFileOutputDelimited.

  2. Connect the tFileList to the tDTDValidator with an Iterate link and the remaining component using a main row.

  3. Set the tFileList component properties, to fetch an XML file from a folder.

    Click the plus button to add a filemask line and enter the filemask: *.xml. Remember Java code requires double quotes.

    Set the path of the XML files to be verified.

    Select No from the Case Sensitive drop-down list.

  4. In the tDTDValidate Component view, the schema is read-only as it contains standard log information related to the validation process.

    In the Dtd file field, browse to the DTD file to be used as reference.

  5. Click in the XML file field, press Ctrl+Space bar to access the variable list, and double-click the current filepath global variable: tFileList.CURRENT_FILEPATH.

  6. In the various messages to display in the Run tab console, use the jobName variable to recall the job name tag. Recall the filename using the relevant global variable: ((String)globalMap.get("tFileList_1_CURRENT_FILE")). Remember Java code requires double quotes.

    Select the Print to Console check box.

  7. In the tMap component, drag and drop the information data from the standard schema that you want to pass on to the output file.

  8. Once the Output schema is defined as required, add a filter condition to only select the log information data when the XML file is invalid.

    Follow the best practice by typing first the wanted value for the variable, then the operator based on the type of data filtered then the variable that should meet the requirement. In this case: 0 == row1.validate.

  9. Then connect (if not already done) the tMap to the tFileOutputDelimited component using a Row > Main connection. Name it as relevant, in this example: log_errorsOnly.

  10. In the tFileOutputDelimited Basic settings, Define the destination filepath, the field delimiters and the encoding.

  11. Save your Job and press F6 to run it.

    On the Run console the messages defined display for each of the files. At the same time the output file is filled with the log data for invalid files.