tFTPFileList - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tFTPFileList iterates on files and/or folders of a given directory on a remote host.

Purpose

tFTPFileList retrieves files and /or folders based on a defined filemask pattern and iterates on each of them by connecting to a remote directory via an FTP protocol.

tFTPFileList properties

Component family

Internet/FTP

 

Basic settings

Property Type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: No property data stored centrally.

 

 

Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data.

 

Use an existing connection/Component List

Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

 

Host

FTP IP address.

 

Port

Listening port number of the FTP server.

 

Username and Password (or Private key)

User authentication information.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

 

Remote directory

Path to the remote directory.

 

Move to the current directory

This option appears when Use an existing connection is enabled. Select this check box to change the directory to the one specified in the Remote directory field. The next FTP component that is linked to the tFTPFileList in the Job will take this directory as the root of the remote directory when using the same connection.

 

File detail

Select this check box if you want to display the details of each of the files or folders on the remote host. These informative details include:

type of rights on the file/folder, name of the author, name of the group of users that have a read-write rights, file size and date of last modification.

 

SFTP Support

Select this check box to connect to the FTP server via an SFTP connection. The following properties will be available:

Authentication method: Select the SFTP authentication method, either Public key or Password.

  • Public key: Enter the path to the private key and the passphrase for the key in the Private key and Key Passphrase fields correspondingly.

  • Password: Enter the password required.

Filename encoding: Select this check box to set the encoding used to convert file names from Strings to bytes. It should be the same encoding used on the SFTP server.

Note

If the SFTP server's version is greater than 3, the encoding should be UTF-8, or else an error occurs.

 

Files

Click the plus button to add the lines you want to use as filters:

Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions.

 

Connection Mode

Select the SFTP connection mode you want to use:

Active: You determine the connection port to be used to allow data transfer.

Passive: the FTP server determines the connection port to use to allow data transfer.

 

Encoding

Select an encoding type from the list, or select Custom and define it manually. This field is compulsory for DB data handling.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

CURRENT_FILE: the current file name. This is a Flow variable and it returns a string.

CURRENT_FILEPATH: the current file path. This is a Flow variable and it returns a string.

NB_FILE: the number of files processed. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is typically used as a single-component sub-job but can also be used with other components.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

Due to license incompatibility, one or more JARs required to use this component are not provided. You can install the missing JARs for this particular component by clicking the Install button on the Component tab view. You can also find out and add all missing JARs easily on the Modules tab in the Integration perspective of your studio. For details, see the article Installing External Modules on Talend Help Center (https://help.talend.com) how to configure the Studio in the Talend Installation and Upgrade Guide.

Scenario: Listing and getting files/folders on an FTP directory

Here is an example of using Talend FTP components to iterate and list all files and folders on an FTP server directory, and then get only text files on that directory to a local directory.

Creating a Job for listing and getting files/folders on an FTP directory

Create a Job to connect to an FTP server, iterate and list all files and folders on an FTP root directory, then get only text files on the FTP root directory to a local directory, finally close the connection to the server.

Prerequisites: To replicate this scenario, an FTP server must be started and a couple of files/folders must be put onto the root directory of the FTP server.

  1. Create a new Job and add a tFTPConnection component, a tFTPFileList component, a tIterateToFlow component, a tLogRow component, a tFTPGet component, and a tFTPClose component by typing their names in the design workspace or dropping them from the Palette.

  2. Link the tFTPFileList component to the tIterateToFlow component using a Row > Iterate connection.

  3. Link the tIterateToFlow component to the tLogRow component using a Row > Main connection.

  4. Link the tFTPConnection component to the tFTPFileList component using a Trigger > OnSubjobOk connection.

  5. Do the same to link the tFTPFileList component to the tFTPGet component, and the tFTPGet component to the tFTPClose component.

Opening a connection to the FTP server

Configure the tFTPConnection component to open a connection to the FTP server.

  1. Double-click the tFTPConnection component to open its Basic settings view.

  2. In the Host and Port fields, enter the FTP server IP address and the listening port number respectively.

  3. In the Username and Password fields, enter the authentication details.

Listing all files/folders on the FTP root directory

Configure the tFTPFileList component, the tIterateToFlow component, and the tLogRow component to iterate all files and folders on the FTP root directory and display the names and paths of these files and folders on the console of Talend Studio.

  1. Double-click the tFTPFileList component to open its Basic settings view.

  2. Specify the connection details required to access the FTP server. In this example, select the Use an existing connection check box and from the Component list drop-down list displayed, select the connection component to reuse its connection details you have already defined.

  3. In the Remote directory field, specify the FTP server directory on which the files and folders will be iterated. In this example, it is /, which means the root directory of the FTP server.

  4. Clear the Move to the current directory check box.

  5. Double-click the tIterateToFlow component to open its Basic settings view.

  6. Click the button next to Edit schema to open the schema dialog box.

  7. Click the button to add two String type columns filename and filepath that will hold the names and paths of the files to be iterated respectively. When done, click OK to close the dialog box.

  8. In the Mapping table, set the values for the filename and filepath columns. In this example, the global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILE")) for filename and the global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) for filepath.

    Note that you can fill the values by pressing Ctrl + Space to access the global variables list and then selecting tFTPFileList_1_CURRENT_FILE and tFTPFileList_1_CURRENT_FILEPATH from the list.

  9. Double-click the tLogRow component to open its Basic settings view, and then select Table (print values in cells of a table) in the Mode area for better readability of the result.

Getting files on the FTP server directory to a local directory

Configure the tFTPGet component to get only the text files on the FTP root directory to a local directory.

  1. Double-click the tFTPGet component to open its Basic settings view.

  2. Specify the connection details required to access the FTP server. In this example, select the Use an existing connection check box and from the Component list drop-down list displayed, select the connection component to reuse its connection details you have already defined.

  3. In the Local directory field, specify the local directory to which the files and folders will be downloaded. In this example, it is D:/FtpDownloads.

  4. In the Remote directory field, specify the FTP server directory under which the files and folders will be downloaded. In this example, it is /, which means the root directory of the FTP server.

  5. In the Files table, click the [+] button to add a line and in the Filemask column field, enter *.txt between double quotation marks to get only the text files on the FTP directory to the local directory.

Closing the connection to the FTP server

Configure the tFTPClose component to close the connection to the FTP server.

  1. Double-click the tFTPClose component to open its Basic settings view.

  2. From the Component list drop-down list, select the tFTPConnection component that opens the connection you need to close. In this example, only one tFTPConnection component is used and it is selected by default.

Executing the Job to list and get files/folders on the FTP directory

After setting up the Job and configuring the components used in the Job for listing and getting files/folders on the FTP directory, you can then execute the Job and verify the Job execution result.

  1. Press Ctrl + S to save the Job.

  2. Press F6 to execute the Job.

    As shown above, the names and paths of the files and folders on the FTP server root directory are displayed on the console, and only the text files are downloaded to the specified local directory.