Creating a preparation on a Databricks Delta table - Cloud

Talend Cloud Apps Connectors Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Inventory
Talend Data Preparation
Talend Pipeline Designer
Content
Administration and Monitoring > Managing connections
Design and Development > Designing Pipelines
Last publication date
2024-03-21
Use an Azure Data Lake Gen2 connection to create a dataset from a Databricks Delta table, and use it in Talend Cloud Data Preparation.

Procedure

  1. Click Connections > Add connection.
  2. In the panel that opens, select the type of connection you want to create.

    Example

    Azure Data Lake Storage Gen2
  3. Select your engine in the Engine list.
    Note:
    • It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
    • If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
    • The list of available connection types depends on the engine you have selected.
  4. Select the type of connection you want to create.
    Here, select Azure Data Lake Storage Gen2.
  5. Fill in the connection properties to access your Azure Data Lake Storage Gen2 file system as described in Azure Data Lake Storage Gen2 properties, check the connection and click Add dataset.
  6. In the Add a new dataset panel, name your dataset.

    Example

    Databricks Delta table
  7. Fill in the required properties to access the Delta table in your storage account.
  8. In the Format field, select Delta.
  9. Click View sample to see a preview of your dataset, and click Validate to finalize the dataset creation.
  10. To create a new preparation on the Databricks Delta table, you can:
    • From the Dataset list, point your mouse over the dataset you want to use as source material for a preparation, click the Talend Cloud Data Preparation icon and select Add to directly start working on this data.
    • From the preparations list, click the Add preparation button. In the form that opens, give a name to your preparation, select the source dataset that has been created beforehand and click Submit.

Results

The preparation directly opens with an empty recipe, and you can start performing preparation operations on your Databricks Delta dataset. The preparation will be created in the folder in which you are currently working. Furthermore, your preparation will automatically be saved in the preparations list, and all the changes you make when preparing data are also saved automatically.