Designing the Job in Talend Studio - Cloud

Talend Cloud Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

To create a Live dataset, you must design a Job that uses the tDatasetOutput component as output.

Warning:

In order for the dataset to be retrievable by Talend Cloud Data Preparation, the name of your Job, and the Task that will be created from it, must have dataprep_ as prefix. In this example, the Job will be saved as dataprep_live_dataset_tmc.

The simplest Job design required to create a working Live dataset is the following:

You can use any other type of component as input for your data, but the Job must use tDatasetOutput as output.

Before you begin

  • You have the 7.2 version of Talend Studio.
  • You have configured a cloud connection in the Preferences window of Talend Studio. For more information, see the Talend Cloud Getting Started Guide.
  • The name of your Job has dataprep_ as prefix.

Procedure

  1. In the design workspace, add an input component, tRowGenerator in this example, and click the Component tab to define its basic settings.
  2. Click the [...] next to RowGenerator Editor to configure a schema for your data and choose the number of rows to be generated.
  3. Add the tDatasetOutput component in the design workspace.
  4. Link the tRowGenerator and tDatasetOutput components together using a Row > Main link.
  5. Click the Component tab of the tDatasetOutput component to define its basic settings.
  6. Click Sync Column to retrieve the schema from the previous component.
  7. Select LiveDataset in the Mode list.

    The Url and Limit fields are automatically filled.

  8. Save your Job, and from the Repository tree view, right click your Job and select Publish to Cloud.

    The Publish to Cloud window opens, where you can enter a version number for your Job.

  9. Click Finish.
  10. When the publication is over, you have the possibility to open the newly created Task in the Talend Cloud Management Console interface. Ignore this step and click OK.

    Clicking Open Job Task opens your Task in the Talend Cloud Management Console interface. You can actually ignore it and go to the Talend Cloud Data Preparation interface.

Results

Your Job has been published as a Task to Talend Cloud Management Console, where it is available in the Management > Tasks and plans tab of the left panel menu.

What to do next

If you want this Task to run on the default Cloud Engine, you can directly go to the Talend Cloud Data Preparation application interface to create your Live dataset.

If you want your Task to run on a Remote engine, or another Cloud Engine than the default one, go to the Talend Cloud Management Console application to edit the Task:

  1. Select the dataprep_live_dataset_tmc Task.
  2. Point your mouse over the Configuration panel and click the pen icon to edit the task.
  3. In the Go Live > Runtime drop-down list, select your preferred engine and in the Go Live > Run type drop-down list, select To be used in Plans only.

    You must not select any other value for this field. The Task must not be scheduled because it will be triggered on-demand by users in Talend Cloud Data Preparation.

  4. Click Go live.