Scenario: Listing files with the same prefix from a bucket - 6.1

Talend Components Reference Guide

English (United States)
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Talend Studio
Data Governance
Data Quality and Preparation
Design and Development

In this scenario, tS3List is used to list all the files in a bucket which have the same prefix.

There are such files in this bucket:

For how to create a bucket and put files into it, see Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets and Scenario: File exchanges with Amazon S3 .

Linking the components

  1. Drop tS3Connection, tS3List, tIterateToFlow, tLogRow and tS3Close onto the workspace.

  2. Link tS3Connection to tS3List using the OnSubjobOk trigger.

  3. Link tS3List to tIterateToFlow using the Row > Iterate connection.

  4. Link tIterateToFlow to tLogRow using the Row > Main connection.

  5. Link tS3List to tS3Close using the OnSubjobOk trigger.

Configuring the components

  1. Double-click tS3Connection to open its Basic settings view.

  2. In the Access Key and Secret Key fields, enter the authentication credentials.

  3. Double-click tS3List to open its Basic settings view.

  4. Select the Use existing connection check box to reuse the connection.

  5. In the Bucket area, click the [+] button to add one line.

  6. In the Bucket name and Key prefix fields, enter the bucket name and file prefix.

    This way, only files with the specified prefix will be listed.

  7. Double-click tIterateToFlow to open its Basic settings view.

  8. Click Edit schema to open the schema editor.

    Click the [+] button to add one column, namely file_list of the String type.

    Click Ok to validate the setup and close the schema editor.

  9. In the Mapping area, press Ctrl + Space in the Value field to choose the variable tS3List_1_CURRENT_KEY.

  10. Double-click tLogRow to open its Basic settings view.

    Select Table (print values in cells of a table) for a better display of the results.

  11. Double-click tS3Close to open its Basic settings view.

    There is no need to select a connection component as the only one is selected by default.

Executing the Job

  1. Press Ctrl + S to save the Job.

  2. Press F6 to run the Job.

    As shown above, only the files with the prefix "in" are listed.