tFTPFileList - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

tFTPFileList properties

Component family

Internet/FTP

 

Function

tFTPFileList iterates on files and/or folders of a given directory on a remote host.

Purpose

tFTPFileList retrieves files and /or folders based on a defined filemask pattern and iterates on each of them by connecting to a remote directory via an FTP protocol.

Basic settings

Property Type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: No property data stored centrally.

 

 

Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data.

 

Use an existing connection/Component List

Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

 

Host

FTP IP address.

 

Port

Listening port number of the FTP server.

 

Username and Password (or Private key)

User authentication information.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

 

Remote directory

Path to the remote directory.

 

Move to the current directory

This option appears when Use an existing connection is enabled. Select this check box to change the directory to the one specified in the Remote directory field. The next FTP component that is linked to the tFTPFileList in the Job will take this directory as the root of the remote directory when using the same connection.

 

File detail

Select this check box if you want to display the details of each of the files or folders on the remote host. These informative details include:

type of rights on the file/folder, name of the author, name of the group of users that have a read-write rights, file size and date of last modification.

 

SFTP Support

Select this check box to connect to the FTP server via an SFTP connection. The following properties will be available:

Authentication method: Select the SFTP authentication method, either Public key or Password.

  • Public key: Enter the path to the private key and the passphrase for the key in the Private key and Key Passphrase fields correspondingly.

  • Password: Enter the password required.

Filename encoding: Select this check box to set the encoding used to convert file names from Strings to bytes. It should be the same encoding used on the SFTP server.

Note

If the SFTP server's version is greater than 3, the encoding should be UTF-8, or else an error occurs.

 

Files

Click the plus button to add the lines you want to use as filters:

Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions.

 

Connection Mode

Select the SFTP connection mode you want to use:

Active: You determine the connection port to be used to allow data transfer.

Passive: the FTP server determines the connection port to use to allow data transfer.

 

Encoding

Select an encoding type from the list, or select Custom and define it manually. This field is compulsory for DB data handling.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

CURRENT_FILE: the current file name. This is a Flow variable and it returns a string.

CURRENT_FILEPATH: the current file path. This is a Flow variable and it returns a string.

NB_FILE: the number of files processed. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is typically used as a single-component sub-job but can also be used with other components.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

Due to license incompatibility, one or more JARs required to use this component are not provided. You can install the missing JARs for this particular component by clicking the Install button on the Component tab view. You can also find out and add all missing JARs easily on the Modules tab in the Integration perspective of your studio. For details, see https://help.talend.com/display/KB/How+to+install+external+modules+in+the+Talend+products or the section describing how to configure the Studio in the Talend Installation Guide.

Scenario: Iterating on a remote directory

The following scenario describes a three-component Job that connects to an FTP server, lists files held in a remote directory based on a filemask and finally recuperates and saves the files in a defined local directory.

Dropping and linking components

  1. Drop the following components from the Palette to the design workspace: tFTPConnection, tFTPFileList and tFTPGet.

  2. Link tFTPConnection to tFTPFileList using an OnSubjobOk connection and then tFTPFileList to tFTPGet using an Iterate connection.

Configuring the components

Configuring a connection to the FTP server

  1. Double-click tFTPConnection to display its Basic settings view and define the component properties.

  2. In the Host field, enter the IP address of the FTP server.

  3. In the Port field, enter the listening port number.

  4. In the Username and Password fields, enter your authentication information for the FTP server.

  5. In the Connect Mode list, select the FTP connection mode you want to use, Passive in this example.

Configuring an FTP download list

  1. Double-click tFTPFileList to open its Basic settings view and define the component properties.

  2. Select the Use an existing connection check box and in the Component list, click the relevant FTP connection component, tFTPConnection_1 in this scenario. Connection information are automatically filled in.

  3. In the Remote directory field, enter the relative path of the directory that holds the files to be listed. Clear the Move to the current directory check box.

  4. In the Filemask field, click the plus button to add one line and then define a file mask to filter the data to be retrieved. You can use special characters if need be. In this example, we want only to recuperate delimited files (*csv).

Configuring file download

  1. Double-click tFTPGet to display its Basic settings view and define the components properties.

  2. Select the Use an existing connection check box and in the Component list, click the relevant FTP connection component, tFTPConnection_1 in this scenario. Connection information are automatically filled in.

  3. In the Local directory field, enter the relative path for the output local directory where you want to write the recuperated files.

  4. In the Remote directory field, enter the relative path of the remote directory that holds the file to be recuperated. Clear the Move to the current directory check box.

  5. In the Transfer Mode list, select the FTP transfer mode you want to use, ascii in this example.

  6. In the Overwrite file field, select an option for you want to use for the transferred files.

  7. In the Files area, click the plus button to add a line in the Filemask list, then click in the added line and pressCtrl+Space to access the variable list. In the list, select the global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) to process all files in the remote directory.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to execute the Job.

    All .csv files held in the remote directory on the FTP server are listed in the defined directory, as defined in the filemask. Then the files are retrieved and saved in the defined local output directory.