Azure Data Lake Storage Gen2 properties - Cloud

Talend Cloud Apps Connectors Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Inventory
Talend Data Preparation
Talend Pipeline Designer
Content
Administration and Monitoring > Managing connections
Design and Development > Designing Pipelines
Last publication date
2024-03-21
Properties to configure to establish a connection to a given Azure Data Lake Storage Gen2 file system.

Azure Data Lake Storage Gen2 connection

Property

Configuration

Selection Select or enter Azure Data Lake Storage Gen2.
Configuration
Engine Select your engine in the list.
Main Authentication Method Select the way you want to authenticate to your storage account:
  • Shared Key: Enter the key associated with the storage account you need to access. Two keys are available for each account and by default, either of them can be used for this access. To know how to get your key, read Manage a storage account.
  • Shared Access Signature: Enter your account SAS token. You can get the SAS token for each allowed service on the Microsoft Azure portal after generating SAS. The SAS token format is https://<$storagename><$service>.core.windows.net/<$sastoken>, where <$storagename> is the storage account name, <$service> is the allowed service name (blob, file, queue or table), and <$sastoken> is the SAS token value. For more information, read Constructing the Account SAS URI.
  • Azure Active Directory: Enter the Tenant ID, Client ID and Client Secret associated with your account for an identity-based authorization of requests to the Blob and Queue services. For more information, read Authorize with Azure Active Directory.
Account Name Enter the name of the Data Lake Storage account you need to access. Ensure that the administrator of the system has granted you the appropriate access permissions to this account.
Endpoint suffix Enter the endpoint suffix corresponding to your Azure cloud account region.

Example: core.windows.net (default for Azure Public)

Example 2: core.chinacloudapi.cn (Azure China Cloud)

Advanced Timeout Sets the maximum number of seconds that a user will wait for a connection to be available. If this time is exceeded and the connection is still unavailable, an exception is thrown.
Description Enter a display name (mandatory) and a description (optional) for the connection.

Azure Data Lake Storage Gen2 dataset

Property Configuration
Dataset name Enter a display name for the dataset. This name will be used as a unique identifier of the dataset in all Talend Cloud apps.
Connection Select your connection in the list. If you are creating a dataset based on an existing connection, this field is read-only.
Filesystem Select or enter the name of your Azure Data Lake Storage file system.
Blob path Enter the path to the directory containing the file to be retrieved.
Format Select in the list the format of the file to be retrieved and enter or select the information related to this file format:
  • CSV:
    • Field delimiter: Select the type of field separator used in the file to be retrieved. If you select Other, you will be able to enter a custom record delimiter in the custom field delimiter field.
    • Record separator: Select the type of record separator used in the file to be retrieved. If you select Other, you will be able to enter a custom record delimiter in the custom record delimiter field.
    • Text enclosure character: Enter the character used to enclose the fields.
    • Escape character: Enter the character to be escaped in the records to be retrieved.
    • Header: Enable this option if the file to be retrieved contains header lines and enter or select the number of lines to be skipped in the schema.
    • CSV schema: Enter the schema corresponding to your CSV file.
    • File encoding: Select the type of encoding used in the file to be retrieved. If you select Other, you will be able to enter a custom encoding type in the Custom encoding field.
  • Avro: No specific parameters required for this format.
  • JSON: No specific parameters required for this format.
  • Parquet: No specific parameters required for this format.
  • Delta: No specific parameters required for this format.
    Important: Partitioned Delta tables are not supported. Any partitioned column will not be returned.