Scenario: Listing files with the same prefix from a bucket - 6.3

Talend Open Studio for Big Data Components Reference Guide

Talend Open Studio for Big Data
Data Governance
Data Quality and Preparation
Design and Development
Talend Studio

In this scenario, tS3List is used to list all the files in a bucket which have the same prefix.

There are such files in this bucket:

For how to create a bucket and put files into it, see Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets and Scenario: File exchanges with Amazon S3 .

Linking the components

  1. Drop tS3Connection, tS3List, tIterateToFlow, tLogRow and tS3Close onto the workspace.

  2. Link tS3Connection to tS3List using the OnSubjobOk trigger.

  3. Link tS3List to tIterateToFlow using the Row > Iterate connection.

  4. Link tIterateToFlow to tLogRow using the Row > Main connection.

  5. Link tS3List to tS3Close using the OnSubjobOk trigger.

Configuring the components

  1. Double-click tS3Connection to open its Basic settings view.

  2. In the Access Key and Secret Key fields, enter the authentication credentials.

  3. Double-click tS3List to open its Basic settings view.

  4. Select the Use existing connection check box to reuse the connection.

  5. In the Bucket area, click the [+] button to add one line.

  6. In the Bucket name and Key prefix fields, enter the bucket name and file prefix.

    This way, only files with the specified prefix will be listed.

  7. Double-click tIterateToFlow to open its Basic settings view.

  8. Click Edit schema to open the schema editor.

    Click the [+] button to add one column, namely file_list of the String type.

    Click Ok to validate the setup and close the schema editor.

  9. In the Mapping area, press Ctrl + Space in the Value field to choose the variable tS3List_1_CURRENT_KEY.

  10. Double-click tLogRow to open its Basic settings view.

    Select Table (print values in cells of a table) for a better display of the results.

  11. Double-click tS3Close to open its Basic settings view.

    There is no need to select a connection component as the only one is selected by default.

Executing the Job

  1. Press Ctrl + S to save the Job.

  2. Press F6 to run the Job.