Adding a dataset from Amazon S3 - Cloud

Talend Cloud Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

Talend Data Preparation is able to connect to various data sources to create new datasets.

In this example, you want to prepare some customers data that is stored on Amazon S3. You will enter your Amazon S3 connection information, directly in the Talend Data Preparation interface and create a new dataset from this data.

Procedure

  1. In the Datasets view of the Talend Data Preparation homepage, click the white arrow next to the Add Dataset button.
  2. Select Amazon S3.

    The Add Amazon S3 dataset form opens.

  3. In the Dataset name field, enter the name you want to give your dataset, Amazon S3 dataset for example.
  4. Click Test connection.
    If the connection is successful, the second part of the form is displayed, where you can select the object to import. If the connection is not successful, an error message is displayed, detailing why the connection failed.
  5. From the Bucket drop-down list, select the location of your data in Amazon S3.
    You don't need to select a region for your export. It will automatically be based on your datacenter.
  6. In the Path field, enter the path to the dataset to import from your bucket.
  7. Select the format, record and field delimiter, as well as the text enclosure and escape character, and the encoding of your data in the corresponding fields and drop-down lists.
  8. Click the Add dataset button at the end of the form.

Results

When the import is done, the data extracted from Amazon S3 directly opens in the grid and you can start working on your preparation the same way you usually do.

The data is still stored in Amazon S3, Talend Data Preparation only retrieves a sample on-demand.

The dataset is added to the list in the Datasets view of the homepage.