Scenario: Listing and getting files/folders on an FTP directory - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Here is an example of using Talend FTP components to iterate and list all files and folders on an FTP server directory, and then get only text files on that directory to a local directory.

Creating a Job for listing and getting files/folders on an FTP directory

Create a Job to connect to an FTP server, iterate and list all files and folders on an FTP root directory, then get only text files on the FTP root directory to a local directory, finally close the connection to the server.

Prerequisites: To replicate this scenario, an FTP server must be started and a couple of files/folders must be put onto the root directory of the FTP server.

  1. Create a new Job and add a tFTPConnection component, a tFTPFileList component, a tIterateToFlow component, a tLogRow component, a tFTPGet component, and a tFTPClose component by typing their names in the design workspace or dropping them from the Palette.

  2. Link the tFTPFileList component to the tIterateToFlow component using a Row > Iterate connection.

  3. Link the tIterateToFlow component to the tLogRow component using a Row > Main connection.

  4. Link the tFTPConnection component to the tFTPFileList component using a Trigger > OnSubjobOk connection.

  5. Do the same to link the tFTPFileList component to the tFTPGet component, and the tFTPGet component to the tFTPClose component.

Opening a connection to the FTP server

Configure the tFTPConnection component to open a connection to the FTP server.

  1. Double-click the tFTPConnection component to open its Basic settings view.

  2. In the Host and Port fields, enter the FTP server IP address and the listening port number respectively.

  3. In the Username and Password fields, enter the authentication details.

Listing all files/folders on the FTP root directory

Configure the tFTPFileList component, the tIterateToFlow component, and the tLogRow component to iterate all files and folders on the FTP root directory and display the names and paths of these files and folders on the console of Talend Studio.

  1. Double-click the tFTPFileList component to open its Basic settings view.

  2. Specify the connection details required to access the FTP server. In this example, select the Use an existing connection check box and from the Component list drop-down list displayed, select the connection component to reuse its connection details you have already defined.

  3. In the Remote directory field, specify the FTP server directory on which the files and folders will be iterated. In this example, it is /, which means the root directory of the FTP server.

  4. Clear the Move to the current directory check box.

  5. Double-click the tIterateToFlow component to open its Basic settings view.

  6. Click the button next to Edit schema to open the schema dialog box.

  7. Click the button to add two String type columns filename and filepath that will hold the names and paths of the files to be iterated respectively. When done, click OK to close the dialog box.

  8. In the Mapping table, set the values for the filename and filepath columns. In this example, the global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILE")) for filename and the global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) for filepath.

    Note that you can fill the values by pressing Ctrl + Space to access the global variables list and then selecting tFTPFileList_1_CURRENT_FILE and tFTPFileList_1_CURRENT_FILEPATH from the list.

  9. Double-click the tLogRow component to open its Basic settings view, and then select Table (print values in cells of a table) in the Mode area for better readability of the result.

Getting files on the FTP server directory to a local directory

Configure the tFTPGet component to get only the text files on the FTP root directory to a local directory.

  1. Double-click the tFTPGet component to open its Basic settings view.

  2. Specify the connection details required to access the FTP server. In this example, select the Use an existing connection check box and from the Component list drop-down list displayed, select the connection component to reuse its connection details you have already defined.

  3. In the Local directory field, specify the local directory to which the files and folders will be downloaded. In this example, it is D:/FtpDownloads.

  4. In the Remote directory field, specify the FTP server directory under which the files and folders will be downloaded. In this example, it is /, which means the root directory of the FTP server.

  5. In the Files table, click the [+] button to add a line and in the Filemask column field, enter *.txt between double quotation marks to get only the text files on the FTP directory to the local directory.

Closing the connection to the FTP server

Configure the tFTPClose component to close the connection to the FTP server.

  1. Double-click the tFTPClose component to open its Basic settings view.

  2. From the Component list drop-down list, select the tFTPConnection component that opens the connection you need to close. In this example, only one tFTPConnection component is used and it is selected by default.

Executing the Job to list and get files/folders on the FTP directory

After setting up the Job and configuring the components used in the Job for listing and getting files/folders on the FTP directory, you can then execute the Job and verify the Job execution result.

  1. Press Ctrl + S to save the Job.

  2. Press F6 to execute the Job.

    As shown above, the names and paths of the files and folders on the FTP server root directory are displayed on the console, and only the text files are downloaded to the specified local directory.