Adding a dataset from Azure DLS Gen2 - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-26

Talend Data Preparation is able to connect to various databases and use them as source to create a new dataset.

In this example, you want to prepare some customers data that is stored on an Azure Data Lake Storage Gen2. You will enter your connection information, directly in the Talend Data Preparation interface and create a new dataset from this data.

Before you begin

Procedure

  1. In the Datasets view of the Talend Data Preparation homepage, click the white arrow next to the Add Dataset button.
  2. Select Azure DLS Gen2.

    The Add Azure DLS Gen2 dataset form opens.

  3. In the Dataset name field, enter the name you want to give your dataset.
  4. Enter the Account name of the account you want to access.
  5. Select your Authentication type from the drop-down list.
    • If you select Shared Key, enter your Account key.
    • If you select Shared Access Signature, enter your Azure Shared Access Signature.
    • If you select Azure Active Directory, enter your Tenant ID, Client ID, and Client Secret in the corresponding field.
  6. Click Test connection.
    If the connection is successful, the second part of the form is displayed, where you can enter a query or directly choose a Salesforce module from the list proposed. If not, an error message is displayed, detailing why the connection failed.
  7. Enter the Container and Blob path where the data is located.
  8. Select the format fo the source data between CSV, Avro, Json or Parquet.
  9. Click the Add dataset button at the end of the form.

Results

The data extracted from ADLS Gen2 directly opens in the grid and you can start working on your preparation the same way you usually do.

The data is still stored in ADLS Gen2, Talend Data Preparation only retrieves a sample on-demand.

The dataset is added to the list in the Datasets view of the homepage.