Adding a dataset from Amazon S3 - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-26

Talend Data Preparation is able to connect to various data sources to create new datasets.

In this example, you want to prepare some customers data that is stored on Amazon S3. You will enter your Amazon S3 connection information, directly in the Talend Data Preparation interface and create a new dataset from this data.

Procedure

  1. In the Datasets view of the Talend Data Preparation homepage, click the white arrow next to the Add Dataset button.
  2. Select Amazon S3.

    The Add Amazon S3 dataset form opens.

  3. In the Dataset name field, enter the name you want to give your dataset, Amazon S3 dataset for example.
  4. Click Test connection.
    If the connection is successful, the second part of the form is displayed, where you can select the object to import. If the connection is not successful, an error message is displayed, detailing why the connection failed.
  5. From the Bucket drop-down list, select the location of your data in Amazon S3.
  6. In the Path field, enter the path to the dataset to import from your bucket.
  7. Select the format, record and field delimiter, as well as the text enclosure and escape character, and the encoding of your data in the corresponding fields and drop-down lists.
  8. Click the Add dataset button at the end of the form.

Results

When the import is done, the data extracted from Amazon S3 directly opens in the grid and you can start working on your preparation the same way you usually do.

The data is still stored in Amazon S3, Talend Data Preparation only retrieves a sample on-demand.

The dataset is added to the list in the Datasets view of the homepage.