Using Amazon S3 with Talend Integration Cloud

author
Irshad Burtally
EnrichVersion
6.4
6.3
6.2
6.1
6.0
EnrichProdName
Talend Cloud
task
Design and Development > Designing Flows
Design and Development > Designing Actions
Data Governance > Third-party systems > Amazon services (Integration) > Amazon S3 components
Design and Development > Third-party systems > Amazon services (Integration) > Amazon S3 components
Data Quality and Preparation > Third-party systems > Amazon services (Integration) > Amazon S3 components
EnrichPlatform
Talend Studio
Talend Integration Cloud

Using Amazon S3 with Talend Integration Cloud

Talend Integration Cloud (TIC) is a secure cloud integration platform-as-a-service (iPaaS) that puts powerful graphical tools at your fingertips. Talend Integration Cloud automates the use of Amazon Web Services for integrating your cloud and on-premises data seamlessly and in a secure way. In this article, we will explore how to leverage Talend Integration Cloud to transfer files and data into and out of Amazon S3.

Amazon S3

Amazon Simple Storage Service (S3) is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of object data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize the benefits of scale and to pass those benefits on to developers.

Amazon S3 stores data as objects, i.e. files. It is not a database storage layer. Also note that Amazon Glacier leverages the Amazon S3 data storage infrastructure for archiving purposes.

Amazon S3 Components in Talend Studio

Talend provides several components, as shown below, in the components palette which are built around the operations exposed by Amazon S3.

These operations, as can be seen in the screenshot, are:

  • Create an S3 Bucket
  • Delete an S3 Bucket
  • Check existance of an S3 Bucket
  • Lis all the S3 Buckets
  • Put files into S3 Bucket
  • Get files from S3 Bucket
  • List files in S3 Bucket
  • Delete files in S3 Bucket

These components are used within the Talend Integration Cloud Actions as described below.

Amazon S3 Actions For Talend Integration Cloud

Talend Integration Cloud makes it easy to use Amazon S3 through the following 8 actions (available from the Talend Exchange ):

  • awss3_files_list_source
  • awss3_file_upload_target
  • awss3_file_upload_propagate_step
  • awss3_file_move_target
  • awss3_file_download_source
  • awss3_file_download_process_step
  • awss3_file_delete_target
  • awss3_file_copy_target

As a Talend Integration Cloud user, you will need to import the actions you want to use into your TIC Personal or Shared Workspace. You can import only the actions you will use.

Talend Integration Cloud Actions are of type Source, Targetand Step. Refer to the documentation Getting Started with the web application available here for more information on how to import and use actions in your flows.

Amazon S3 Connection

Talend Integration Cloud enables the user to define an AWS S3 Connection. To do so, the user must navigate to the Connections list as shown below, and then Create New Connection .

This will pop up the New Connection window shown below.

In the connection window below, the user can provide a name to easily identify the connection, and then the Access Key and Secret Key needed to perform the S3 Bucket and File operations.

The flows leveraging these S3 Actions will be executed on the Talend Cloud Engines. Thus, the best way to access S3 from Talend AWS Infrastructure is through the use of Access Key and Secret Key . Please refer to the following article Managing Access Keys for your AWS Account for more information on Access Keys.

Amazon S3 Files List

Action: awss3_files_list_source

This Source Action returns a list of files stored on Amazon S3. It creates a connection to Amazon S3, gets a list of files, filters the list of files accordingly and then sets the filename for each file into the flow, as shown by the Action Design below.

Context Parameters

S3 Connection (as referred above):

  • connection_awss3_access_key : Access key ID of the Amazon S3 account to be used
  • connection_awss3_secret_key : Secret Access key of the Amazon S3 account to be used

General:

  • Bucket: Name of the source bucket where the file is stored
  • Folder: Path to the source file to be listed
  • File Type: Type of the files to be listed. To receive files from folder use the symbol “*” as file type.

Output schema:

  • name of the bucket where the file is stored
  • path to the file to be downloaded
  • content of the file to be downloaded

File Upload Target

Action: awss3_file_upload_target

This Target Action uploads files to Amazon S3. The screenshot below shows the design behind this action.

Context Parameters

Connection:

  • connection_awss3_access_key : Access key ID of the Amazon S3 account to be used
  • connection_awss3_secret_key : Secret Access key of the Amazon S3 account to be used

General:

  • name of the target bucket where the file is to be stored
  • path to the target file to be uploaded

File Upload Propagate

Action: awss3_file_upload_propagate_step

This Step Action uploads files to Amazon S3. The difference between this action and the previous action is that this one can be used in the middle of a flow, and it is not a target action.

Context Parameters

Connection:

  • connection_awss3_access_key : Access key ID of the Amazon S3 account to be used
  • connection_awss3_secret_key : Secret Access key of the Amazon S3 account to be used

General:

  • name of the target bucket where the file is stored
  • path to the target file to be uploaded

Output schema

  • name of the target bucket where the file is to be stored
  • path to the target file
  • content of the source file

File Move Target

Action: awss3_file_move_target

This component moves files on Amazon S3. This Target Action moves files on Amazon S3. To use it, you need to fill some parameters.

Context Parameters

Connection:

  • connection_awss3_access_key : Access key ID of the Amazon S3 account to be used
  • connection_awss3_secret_key : Secret Access key of the Amazon S3 account to be used

General:

  • name of the source bucket where the file is stored
  • path to the source file to be copied
  • name of the target file
  • path to the target file

File Download Source

Action: awss3_file_download_source

This Source Action downloads files stored on Amazon S3 into the Cloud Engine temp directory. The file should later be processed by the flow and then removed from the temp directory.

Context Parameters

Connection:

  • connection_awss3_access_key : Access key ID of the Amazon S3 account to be used
  • connection_awss3_secret_key : Secret Access key of the Amazon S3 account to be used

General:

  • name of the source bucket where the file is stored
  • path to the source file to be downloaded

Output schema:

  • name of the source bucket where the file is stored
  • path to the source file to be downloaded
  • content of the file to be downloaded

File Download Process

Action: awss3_file_download_process_step

This Step Action downloads files from Amazon S3 into the Cloud Engine temp directory. The file should later be processed by the flow and then removed from the temp directory. This action can be used in the middle of a flow.

Context Parameters

Connection:

  • connection_awss3_access_key : Access key ID of the Amazon S3 account to be used
  • connection_awss3_secret_key : Secret Access key of the Amazon S3 account to be used

General:

  • name of the source bucket where the file is stored
  • path to the source file to be listed

Output schema:

  • name of the bucket where the file is stored
  • path to the source file
  • content of the source file

File Delete Target

Action: awss3_file_delete_target

This Target Action deletes files stored on Amazon S3.

Context Parameters

Connection:

  • connection_awss3_access_key : Access key ID of the Amazon S3 account to be used
  • connection_awss3_secret_key : Secret Access key of the Amazon S3 account to be used

General:

  • name of the source bucket where the file is stored
  • path to the source file to be deleted

File Copy Target

Action: awss3_file_copy_target

This Target Action copies files from Amazon S3 to Amazon S3. As shown in the design of the action below, it first download the file to the Cloud Engine where the flow is running and then upload the file back into S3.

Context Parameters

Connection:

  • connection_awss3_access_key: Access key ID of the Amazon S3 account to be used
  • connection_awss3_secret_key: Secret Access key of the Amazon S3 account to be used

General:

  • name of the source bucket where the file is stored
  • path to the source file to be copied
  • name of the target file
  • path to the target file

Sample Flow

This example shows how to copy/download a file from DropBox and then upload it to Amazon S3.

Step 1: Get your Access Key and Secret Key from Amazon

Login into your AWS Console. Navigate to the IAM Dashboard and select your username. Click on create 'Access Key' and download the file containing the Access Key and Secret Key.

Step 2: Create S3 Bucket

Navigate to the S3 Configuration dashboard.

Enter the Bucket Name and select the Region you want to use. An S3 Bucket name is global and unique. It is not possible to have the same S3 Bucket name in multiple regions. It is preferable to choose a region close to your locality to optimise for latency.

Create a bucket and verify it.

Step 3: Create S3 Connection

Login to your Talend Integration Cloud account and create a new AWS S3 Connection (as explained in the section above).

Step 4: Create DropBox Connection

Create a 'DropBox' connection if not already present. You need to have access to DropBox developer site to get the token needed for the DropBox action to access your file in DropBox.

Step 5: Create the Flow

After creating the connection, now we have to build a flow that will copy the file from dropBox and will upload it to AWS S3. Create new flow in Flowbuilder and use dropBox as source connection. Modify the 'File Path Source' to the path of the file that you want to copy.

Place awss3_file_upload action as Target. Click on Mapper and you need to modify two values.

File_bucket_dest

File_path_dest

Click on the values and set default values and set your destination bucket name as defined in S3 in earlier step.

Select the file path where file should get upload along with filename.

Step 6: Run the Flow

Run the flow by clicking on Test Icon. Select Talend Cloud as an execution environment.

Step 7: Verify the results

Verify the file by logging in S3 account. The file should be present in the bucket configured in the earlier step.

Step 8: Cleanup in AWS

It is advisable to remove unused S3 Bucket in Amazon S3 to avoid incurring extra charges.