tFileInputXML Properties - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Component family

XML or File/Input

 

Basic settings

Property type

Either Built-In or Repository.

 

 

Built-In: No property data stored centrally.

 

 

Repository: Select the repository file where the properties are stored.

 

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

 

 

Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.

 

File name/Stream

File name: Name and path of the file to be processed.

Stream: The data flow to be processed. The data must be added to the flow in order for tFileInputXML to fetch these data via the corresponding representative variable.

This variable could be already pre-defined in your Studio or provided by the context or the components you are using along with this component, for example, the INPUT_STREAM variable of tFileFetch; otherwise, you could define it manually and use it according to the design of your Job, for example, using tJava or tJavaFlex.

In order to avoid the inconvenience of hand writing, you could select the variable of interest from the auto-completion list (Ctrl+Space) to fill the current field on condition that this variable has been properly defined.

Related topic to the available variables: see Talend Studio User Guide. Related scenario to the input stream, see Scenario 2: Reading data from a remote file in streaming mode.

 

Loop XPath query

Node of the tree, which the loop is based on.

 

Mapping

Column: Columns to map. They reflect the schema as defined in the Schema type field.

XPath Query: Enter the fields to be extracted from the structured input.

Get nodes: Select this check box to recuperate the XML content of all current nodes specified in the Xpath query list, or select the check box next to specific XML nodes to recuperate only the content of the selected nodes. These nodes are important when the output flow from this component needs to use the XML structure, for example, the Document data type.

For further information about the Document type, see Talend Studio User Guide.

Note

The Get Nodes option functions in the DOM4j and SAX modes, although in SAX mode namespaces are not supported. For further information concerning the DOM4j and SAX modes, please see the properties noted in the Generation mode list of the Advanced Settings tab.

 

Limit

Maximum number of rows to be processed. If Limit = 0, no row is read nor processed. If -1, all rows are read or processed.

 

Die on error

Select this check box to stop the execution of the Job when an error occurs.

Clear the check box to skip any rows on error and complete the process for error-free rows. When errors are skipped, you can collect the rows on error using a Row > Reject link.

Advanced settings Ignore DTD file

Select this check box to ignore the DTD file indicated in the XML file being processed.

 

Advanced separator (for number)

Select this check box to change the separator used for numbers. By default, the thousands separator is a comma (,) and the decimal separator is a period (.).

Thousands separator: define the separators to use for thousands.

Decimal separator: define the separators to use for decimals.

 

Ignore the namespaces

Select this check box to ignore name spaces.

Generate a temporary file: click the three-dot button to browse to the XML temporary file and set its path in the field.

 

Use Separator for mode Xerces

Select this check box if you want to separate concatenated children node values.

Note

This field can only be used if the selected Generation mode is Xerces.

The following field displays:

Field separator: Define the delimiter to be used to separate the children node values.

 

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for database data handling.

 

Generation mode

From the drop-down list select the generation mode for the XML file, according to the memory available and the desired speed:

  • Slow and memory-consuming (Dom4j)

    Note

    This option allows you to use dom4j to process the XML files of high complexity.

  • Memory-consuming (Xerces).

  • Fast with low memory consumption (SAX)

 

Validate date

Select this check box to check the date format strictly against the input schema.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

NB_LINE: the number of rows processed. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

tFileInputXML is for use as an entry componant. It allows you to create a flow of XML data using a Row > Main link. You can also create a rejection flow using a Row > Reject link to filter the data which doesn't correspond to the type defined. For an example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a