Scenario: Iterating on a remote directory - 6.1

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The following scenario describes a three-component Job that connects to an FTP server, lists files held in a remote directory based on a filemask and finally recuperates and saves the files in a defined local directory.

Dropping and linking components

  1. Drop the following components from the Palette to the design workspace: tFTPConnection, tFTPFileList and tFTPGet.

  2. Link tFTPConnection to tFTPFileList using an OnSubjobOk connection and then tFTPFileList to tFTPGet using an Iterate connection.

Configuring the components

Configuring a connection to the FTP server

  1. Double-click tFTPConnection to display its Basic settings view and define the component properties.

  2. In the Host field, enter the IP address of the FTP server.

  3. In the Port field, enter the listening port number.

  4. In the Username and Password fields, enter your authentication information for the FTP server.

  5. In the Connect Mode list, select the FTP connection mode you want to use, Passive in this example.

Configuring an FTP download list

  1. Double-click tFTPFileList to open its Basic settings view and define the component properties.

  2. Select the Use an existing connection check box and in the Component list, click the relevant FTP connection component, tFTPConnection_1 in this scenario. Connection information are automatically filled in.

  3. In the Remote directory field, enter the relative path of the directory that holds the files to be listed. Clear the Move to the current directory check box.

  4. In the Filemask field, click the plus button to add one line and then define a file mask to filter the data to be retrieved. You can use special characters if need be. In this example, we want only to recuperate delimited files (*csv).

Configuring file download

  1. Double-click tFTPGet to display its Basic settings view and define the components properties.

  2. Select the Use an existing connection check box and in the Component list, click the relevant FTP connection component, tFTPConnection_1 in this scenario. Connection information are automatically filled in.

  3. In the Local directory field, enter the relative path for the output local directory where you want to write the recuperated files.

  4. In the Remote directory field, enter the relative path of the remote directory that holds the file to be recuperated. Clear the Move to the current directory check box.

  5. In the Transfer Mode list, select the FTP transfer mode you want to use, ascii in this example.

  6. In the Overwrite file field, select an option for you want to use for the transferred files.

  7. In the Files area, click the plus button to add a line in the Filemask list, then click in the added line and pressCtrl+Space to access the variable list. In the list, select the global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) to process all files in the remote directory.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 or click Run on the Run tab to execute the Job.

    All .csv files held in the remote directory on the FTP server are listed in the defined directory, as defined in the filemask. Then the files are retrieved and saved in the defined local output directory.