tFileInputXML Standard properties - 7.2

XML connectors

author
Talend Documentation Team
EnrichVersion
7.2
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > XML components > XML connectors
Data Quality and Preparation > Third-party systems > XML components > XML connectors
Design and Development > Third-party systems > XML components > XML connectors
EnrichPlatform
Talend Studio

These properties are used to configure tFileInputXML running in the Standard Job framework.

The Standard tFileInputXML component belongs to the File and the XML families.

The component in this framework is available in all Talend products.

Basic settings

Property type

Either Built-In or Repository.

 

Built-In: No property data stored centrally.

 

Repository: Select the repository file where the properties are stored.

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.

 

Built-In: You create and store the schema locally for this component only.

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

File name/Stream

File name: Name and path of the file to be processed.

Warning: Use absolute path (instead of relative path) for this field to avoid possible errors.

Stream: The data flow to be processed. The data must be added to the flow in order for tFileInputXML to fetch these data via the corresponding representative variable.

This variable could be already pre-defined in your Studio or provided by the context or the components you are using along with this component, for example, the INPUT_STREAM variable of tFileFetch; otherwise, you could define it manually and use it according to the design of your Job, for example, using tJava or tJavaFlex.

In order to avoid the inconvenience of hand writing, you could select the variable of interest from the auto-completion list (Ctrl+Space) to fill the current field on condition that this variable has been properly defined.

Related topic to the available variables: see Talend Studio User Guide. Related scenario to the input stream, see Reading data from a remote file in streaming mode.

Loop XPath query

Node of the tree, which the loop is based on.

Mapping

Column: Columns to map. They reflect the schema as defined in the Schema type field.

XPath Query: Enter the fields to be extracted from the structured input.

Get nodes: Select this check box to recuperate the XML content of all current nodes specified in the Xpath query list, or select the check box next to specific XML nodes to recuperate only the content of the selected nodes. These nodes are important when the output flow from this component needs to use the XML structure, for example, the Document data type.

For further information about the Document type, see Talend Studio User Guide.

Note:

The Get Nodes option functions in the DOM4j and SAX modes, although in SAX mode namespaces are not supported. For further information concerning the DOM4j and SAX modes, please see the properties noted in the Generation mode list of the Advanced Settings tab.

Limit

Maximum number of rows to be processed. If Limit = 0, no row is read nor processed. If -1, all rows are read or processed.

Die on error

Select the check box to stop the execution of the Job when an error occurs.

Clear the check box to skip any rows on error and complete the process for error-free rows. When errors are skipped, you can collect the rows on error using a Row > Reject link.

Advanced settings

Ignore DTD file

Select this check box to ignore the DTD file indicated in the XML file being processed.

Advanced separator (for number)

Select this check box to change the separator used for numbers. By default, the thousands separator is a comma (,) and the decimal separator is a period (.).

Thousands separator: define the separators to use for thousands.

Decimal separator: define the separators to use for decimals.

Ignore the namespaces

Select this check box to ignore name spaces.

Generate a temporary file: click the three-dot button to browse to the XML temporary file and set its path in the field.

Use Separator for mode Xerces

Select this check box if you want to separate concatenated children node values.

Note:

This field can only be used if the selected Generation mode is Xerces.

The following field displays:

Field separator: Define the delimiter to be used to separate the children node values.

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for database data handling. The supported encodings depend on the JVM that you are using. For more information, see https://docs.oracle.com.

Generation mode

From the drop-down list select the generation mode for the XML file, according to the memory available and the desired speed:

  • Slow and memory-consuming (Dom4j)

    Note:

    This option allows you to use dom4j to process the XML files of high complexity.

  • Memory-consuming (Xerces).

  • Fast with low memory consumption (SAX)

Validate date

Select this check box to check the date format strictly against the input schema.

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a Job level as well as at each component level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

NB_LINE: the number of rows processed. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Usage rule

tFileInputXML is for use as an entry componant. It allows you to create a flow of XML data using a Row > Main link. You can also create a rejection flow using a Row > Reject link to filter the data which doesn't correspond to the type defined. For an example of how to use these two links, see Procedure.