Talend Data Preparation is able
to connect to various data sources to create new datasets.
In this example, you want to prepare some customers data that is stored on Amazon S3. You
will enter your Amazon S3 connection information, directly in the Talend Data Preparation interface and create a
new dataset from this data.
Procedure
-
In the Datasets view of the Talend Data Preparation homepage, click
the white arrow next to the Add Dataset button.
-
Select Amazon S3.
The Add Amazon S3
dataset form opens.
-
In the Dataset name field, enter the name you want to
give your dataset, Amazon S3 dataset for example.
-
Click Test connection.
If the connection is successful, the second part of the form
is displayed, where you can select the object to import. If the connection is
not successful, an error message is displayed, detailing why the connection
failed.
-
From the Bucket drop-down list, select the location of
your data in Amazon S3.
-
In the Path field, enter the path to the dataset to
import from your bucket.
-
Select the format, record and field delimiter, as well as the text enclosure
and escape character, and the encoding of your data in the corresponding fields
and drop-down lists.
-
Click the Add dataset button at the end of the
form.
Results
When the import is done, the data extracted
from Amazon S3 directly opens in the grid and you can start working on your preparation
the same way you usually do.
The data is still stored in Amazon S3,
Talend Data Preparation only retrieves a
sample on-demand.
The dataset is added to the list in the
Datasets view of the homepage.