Going further: Uploading your dataset to S3 - Cloud

Talend Cloud Pipeline Designer Getting Started Guide

EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Pipeline Designer
task
Deployment > Deploying > Executing Pipelines
Design and Development > Designing Pipelines

If you have an Amazon S3 account, you might want to go further. Once you have uploaded a file to S3, you can create a connection to this S3 bucket and retrieve the dataset from Talend Cloud Pipeline Designer.

You will then be able to reproduce the use case with the dataset hosted in Amazon S3.

Before you begin

  • Make sure your user or user group has the correct permissions to access the Amazon S3 resources.

    If you do not have these permissions you can try one of the following options.
    1. (recommended) Ask the administrator who manages your Amazon account to give you/your user the correct S3 permissions.
    2. Implement your access policy yourself by following the Amazon documentation if you are allowed to do so.
    3. (not recommended) Attach the AmazonS3FullAccess policy to your group/your user through the IAM console. This allows you to read and write to S3 resources without restrictions to a specific bucket. However this is a quick fix that is not recommended by Talend.
    Note: The default error that displays when trying to access S3 resources without sufficient permissions is Bad Gateway.
  • Retrieve the financial_transactions.avro file from the Downloads tab in the left panel of this page.

Procedure

  1. Upload the financial_transactions.avro file to your Amazon S3 bucket as described in the Amazon S3 documentation.
  2. On the Home page of Talend Cloud Pipeline Designer, click Connections > ADD CONNECTION.
  3. In the panel that opens, give a name to your connection, s3 connection for example.
  4. Select your Remote Engine Gen2 in the Engine list.
    Note: If you want to use a Remote Engine Gen2, you need to create it from Talend Cloud Management Console. If it exists but does not have the AVAILABLE status that means it is up and running, you will not be able to select a Connection type in the list nor to save the new connection. The list of available connection types depend on the engine you have selected.
  5. Select S3 connection in the Connection type list.
  6. Check your connection and click ADD DATASET to point to the file that you have previously uploaded in your S3 bucket.
  7. In the Add a new dataset panel, fill in the connection information to your S3 bucket:
    1. Give a display name to your dataset, financial data on S3 for example.
    2. Add a description if needed.
    3. In the Bucket field, select or type the name of your S3 bucket.
    4. In the Path field, type in the path to the financial_transactions.avro file you have previously uploaded to your S3 bucket.
    5. In the Format list, click AUTO DETECT to automatically detect the format or select Avro in the list.
  8. Click VIEW SAMPLE to check that your data is valid and can be previewed.
  9. Click VALIDATE to save your dataset.

Results

On the Datasets page, the new dataset is added to the list and can be used to reproduce the use case you have created previously.
Before executing this pipeline, select whether you want to overwrite the existing data on S3 or merge them in the configuration tab of the destination dataset:

Once your pipeline is executed, the updated data will be visible in the file located on Amazon S3.