Creating a dataset from a Talend Job

Talend Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
6.3
2.0
EnrichProdName
Talend Data Fabric
Talend Real-Time Big Data Platform
Talend Big Data Platform
Talend Big Data
Talend MDM Platform
Talend Data Integration
Talend Data Services Platform
Talend Data Management Platform
Talend ESB
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

You can use a Talend Job with any input flow to create a dataset in Talend Data Preparation.

To create a dataset via Talend Studio, you must design a Job that uses the tDatasetOutput component as output and set it to Create mode. You can use any type of input flow, but the simplest Job design required to create a dataset is the following:

Procedure

  1. In the design workspace, add the tRowGenerator component, and click the Component tab to define its basic settings.
  2. Click the [...] next to RowGenerator Editor to configure a schema for your data and choose the number of rows to be generated.
  3. In the design workspace, add the tDatasetOutput component, and click the Component tab to define its basic settings.
  4. Click Sync Column to retrieve the schema from the previous component.
  5. In the URL field, type the URL of the Talend Data Preparation web application, between double quotes. Port 9999 is the default port for Talend Data Preparation .
  6. In the Email field, type the email address that you use to log in the Talend Data Preparation web application, between double quotes.
  7. In the Password field, type your password for the Talend Data Preparation web application, between double quotes.

    The user those credentials belong to, will be the owner of the newly created dataset. He will also be the one to have the possiblity to share this dataset to other users.

  8. Select the Create mode from the Mode drop-down list.

    Setting the mode to Update allows you to use the input to update the dataset defined in the Dataset Name field.

  9. In the Dataset Name field, enter a name for your dataset, between double quotes, create_dataset_from_job in this example.
  10. Link the two components together using a Row > Main link.
  11. Save your Job and press F6 to execute it.

Results

You can now log in the Talend Data Preparation web application, where the new dataset is available in the Datasets view.