Using Amazon S3 with Talend Cloud Management Console - Cloud

author
Irshad Burtally
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Governance > Third-party systems > Amazon services (Integration) > Amazon S3 components
Data Quality and Preparation > Third-party systems > Amazon services (Integration) > Amazon S3 components
Design and Development > Designing Tasks
Design and Development > Third-party systems > Amazon services (Integration) > Amazon S3 components
EnrichPlatform
Talend Management Console
Talend Studio

Using Amazon S3 with Talend Cloud Management Console

Talend Cloud Management Console is a secure cloud integration platform-as-a-service (iPaaS) that puts powerful graphical tools at your fingertips.

Amazon S3

Amazon Simple Storage Service (S3) is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of object data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize the benefits of scale and to pass those benefits on to developers.

Amazon S3 stores data as objects (files). It is not a database storage layer. Also note that Amazon Glacier leverages the Amazon S3 data storage infrastructure for archiving purposes.

Amazon S3 components in Talend Studio

Talend provides several components, as shown below, in the components palette which are built around the operations exposed by Amazon S3.

These operations, as can be seen in the screen capture, are:

  • Create an S3 Bucket
  • Delete an S3 Bucket
  • Check existence of an S3 Bucket
  • List all the S3 Buckets
  • Put files into S3 Bucket
  • Get files from S3 Bucket
  • List files in S3 Bucket
  • Delete files in S3 Bucket

These components are used within the Talend Cloud Management Console Tasks as described below.

Amazon S3 connection

Create connections in Talend Studio as follows. Right click Context and click Create Context Group.

Give aws_context as group and create three variables as shown below:

Use this context group in Talend Studio Jobs and Talend Cloud Management Console.

The Tasks leveraging these S3 native connection will be executed on the Talend Cloud Management Console Engines. Thus, the best way to access S3 from Talend AWS Infrastructure is through the use of Access Key and Secret Key. Please refer to the following article Managing Access Keys for your AWS Account for more information on Access Keys.

Amazon S3 files list

This Job returns a list of files stored on Amazon S3. It creates a connection to Amazon S3, gets a list of files, filters the list of files accordingly and then sets the file name for each file into the flow, as shown by the Job below:

Context parameters

S3 Connection (as referred above):

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Bucket: Name of the source bucket where the file is stored.
  • Folder: Path to the source file to be listed.
  • File Type: Type of the files to be listed. To receive files from folder, use the symbol * as file type.

Output schema:

  • Name of the bucket where the file is stored.
  • Path to the file to be downloaded.
  • Content of the file to be downloaded.

Amazon S3 files upload

This Job uploads files to Amazon S3. The screen capture below shows the design:

Context parameters

Connection:

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Name of the target bucket where the file is to be stored.
  • Path to the target file to be uploaded.

Amazon S3 file move

This component moves files on Amazon S3. To use it, you need to fill some parameters.

Context parameters

Connection:

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Name of the source bucket where the file is stored.
  • Path to the source file to be copied.
  • Name of the target file.
  • Path to the target file.

Amazon S3 file download

This Job downloads files stored on Amazon S3 into the Cloud Engine temp directory. The file should later be processed by the Task and then removed from the temp directory.

Context parameters

Connection:

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Name of the source bucket where the file is stored.
  • Path to the source file to be downloaded.

Output schema:

  • Name of the source bucket where the file is stored.
  • Path to the source file to be downloaded.
  • Content of the file to be downloaded.

Amazon S3 file delete

This Job deletes files stored on Amazon S3.

Context parameters

Connection:

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Name of the source bucket where the file is stored.
  • Path to the source file to be deleted.

Publish and run on Cloud

  1. To publish these Jobs to Cloud, from Talend Studio right click on the Job and select Publish to Cloud.
  2. Select the workspace for the Job to be published and click Finish.
  3. Once the Job is published to Cloud, a message with status will be displayed.
  4. Log in to Talend Cloud Management Console and verify the Task.
  5. Expand the Advanced Parameters and validate the context values.
  6. Click Run Now and test the Task.
  7. Validate the Task logs by clicking View Logs.