tEXistGet - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

This component retrieves resources from a remote eXist DB server.

Purpose

tEXistGet downloads selected resources from a remote DB server to a defined local directory.

tEXistGet properties

Component family

Databases/eXist

 

Basic settings

Use an existing connection/Component List

Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

Note that when a Job contains the parent Job and the child Job, Component List presents only the connection components in the same Job level.

 

URI

URI of the database you want to connect to.

 

Collection

Enter the path to the collection of interest on the database server.

 

Driver

This field is automatically populated with the standard driver.

Note

Users can enter a different driver, depending on their needs.

 

Username and Password

User authentication information.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

 

Local directory

Path to the file's destination location.

 

Files

Click the plus button to add the lines you want to use as filters:

Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions

Advanced settings

tStatCatcher Statistics

Select this check box to gather the job processing metadata at a job level as well as at each component level.

Global Variables

NB_FILE: the number of files processed. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is typically used as a single component sub-job but can also be used as an output or end object. eXist-db is an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing.

For further information about XQuery, see XQuery.

For further information about the XQuery update extension, see XQuery update extension.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

Due to license incompatibility, one or more JARs required to use this component are not provided. You can install the missing JARs for this particular component by clicking the Install button on the Component tab view. You can also find out and add all missing JARs easily on the Modules tab in the Integration perspective of your studio. For details, see the article Installing External Modules on Talend Help Center (https://help.talend.com) how to configure the Studio in the Talend Installation and Upgrade Guide.

Scenario: Retrieve resources from a remote eXist DB server

This is a single-component Job that retrieves data from a remote eXist DB server and download the data to a defined local directory.

This simple Job requires one component: tEXistGet.

  1. Drop the tEXistGet component from the Palette into the design workspace.

  2. Double-click the tEXistGet component to open the Component view and define the properties in its Basic settings view.

  3. Fill in the URI field with the URI of the eXist database you want to connect to.

    In this scenario, the URI is xmldb:exist://192.168.0.165:8080/exist/xmlrpc. Note that the URI used in this use case is for demonstration purposes only and is not an active address.

  4. Fill in the Collection field with the path to the collection of interest on the database server, /db/talend in this scenario.

  5. Fill in the Driver field with the driver for the XML database, org.exist.xmldb.DatabaseImpl in this scenario.

  6. Fill in the Username and Password fields by typing in admin and talend respectively in this scenario.

  7. Click the three-dot button next to the Local directory field to set a path for saving the XML file downloaded from the remote database server.

    In this scenario, set the path to your desktop, for example C:/Documents and Settings/galano/Desktop/ExistGet.

  8. In the Files field, click the plus button to add a new line in the Filemask area, and fill it with a complete file name to retrieve data from a particular file on the server, or a filemask to retrieve data from a set of files. In this scenario, fill in dictionary_en.xml.

  9. Save your Job and press F6 to execute it.

    The XML file dictionary_en.xml is retrieved and downloaded to the defined local directory.