Creating a dataset - Cloud

Talend Cloud Data Preparation User Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Preparation
Content
Administration and Monitoring > Managing connections
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Managing datasets
Last publication date
2024-04-15
How to create a dataset from scratch.

Procedure

  1. Go to Datasets > Add dataset.
  2. In the Add a new dataset panel, give a name to your dataset and select the connection in which you want to create your dataset.
    If you want to add a dataset from a connection that does not exist yet, you can create this connection directly from the connection drop-down list.
  3. Add a description if needed, and fill in the required properties of the dataset.
    • For S3 and HDFS file storage connections, an Auto detect button allows you to automatically detect and fill in the format of your data (CSV, Excel, Avro, or Parquet).

    • The database query and table types are not compatible as you cannot use a query type database as a Destination dataset. Therefore if you try to change the database configuration to another type after saving it, a check will be triggered on your pipeline to see whether this operation is possible.

  4. (Optional) Click View sample to see a preview of the first records of your dataset sample.
  5. Click Validate to save your dataset.

Results

The new dataset is added to the list on the Datasets page and is ready to be used.
Once created, you can go to the dataset detailed view to display a sample of your data in different formats:
  • Grid: from this view you can display the first 10 000 records of your data in tabular form
  • Hierarchy: from this view you can display the first 10 000 records of your data in a tree-like structure
  • Raw: from this view you can display an untouched and unfiltered version of the first 10 000 records of your data