Scenario: Copying an S3 object from one bucket to another - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a Job that uploads a new object to an existing empty S3 bucket bucket-src, then copies the object from the bucket bucket-src to another existing empty S3 bucket bucket-dst, finally lists the object in the bucket bucket-dst to see whether the object is successfully copied.

Setting up the Job

  1. Create a new Job and add a tS3Connection component, a tS3Put component, a tS3Copy component, a tS3List component, a tIterateToFlow component, and a tLogRow component by typing their names on the design workspace or dropping them from the Palette.

  2. Link the tS3List component to the tIterateToFlow component using a Row > Iterate connection.

  3. Link the tIterateToFlow component to the tLogRow component using a Row > Main connection.

  4. Link the tS3Connection component to the tS3Put component using a Trigger > On Subjob Ok connection.

  5. Do the same to link the tS3Put component to the tS3Copy component and the tS3Copy component to the tS3List component.

Configuring the components

Creating a connection to Amazon S3

  1. Double-click the tS3Connection component to open its Basic settings view on the Component tab.

  2. In the Access Key and Secret Key fields, enter the authentication credentials required to access Amazon S3.

  3. From the Region drop-down list, select an AWS region where the object will be uploaded and copied. In this example, we keep the default setting.

Uploading an object to an Amazon S3 bucket

  1. Double-click the tS3Put component to open its Basic settings view on the Component tab.

  2. Select the Use an existing connection check box to reuse the Amazon S3 connection information you have defined in the tS3Connection component.

  3. In the Bucket field, enter the name of the S3 bucket where the object will be uploaded. In this example, it is bucket-src that already exists in Amazon S3.

  4. In the Key field, enter the key for the object to be uploaded. In this example, it is tS3Copy_icon32_src.png.

  5. In the File field, browse to or enter the path to the object to be uploaded. In this example, it is D:/tS3Copy_icon32.png.

Copying the uploaded object to another Amazon S3 bucket

  1. Double-click the tS3Copy component to open its Basic settings view on the Component tab.

  2. Select the Use an existing connection check box to reuse the Amazon S3 connection information you have defined in the tS3Connection component.

  3. In the Bucket field in the Source Configuration area, enter the name of the bucket which contains the object to be copied. In this example, it is bucket-src.

  4. In the Key field in the Source Configuration area, enter the key of the object to be copied. In this example, it is tS3Copy_icon32_src.png.

  5. In the Bucket field in the Destination Configuration area, enter the name of the bucket to which the object will be copied. In this example, it is the empty one bucket-dst that already exists in Amazon S3.

  6. In the Key field in the Destination Configuration area, enter the new key for the object after being copied to the destination bucket. In this example, it is tS3Copy_icon32_dst.png.

Listing the object in the destination bucket

  1. Double-click the tS3List component to open its Basic settings view on the Component tab.

  2. Select the Use an existing connection check box to reuse the Amazon S3 connection information you have defined in the tS3Connection component.

  3. Clear the List all buckets objects check box, and then click the [+] button to add one row in the Bucket table displayed and set the value for each column. In this example, bucket-dst for the Bucket name column and empty value for the Key prefix column, this way only the objects in the bucket-dst bucket will be listed.

  4. Double-click the tIterateToFlow component to open its Basic settings view on the Component tab.

  5. Click the [...] button next to Edit schema and in the pop-up schema dialog box define the schema by adding one column ObjectList of String type.

  6. Click OK to save the changes and in the pop-up dialog box click Yes to accept the propagation.

  7. Double-click the tLogRow component to open its Basic settings view on the Component tab.