Bulk loading data from Azure DLS Gen2 into Azure Synapse - Cloud

Talend Cloud Apps Connectors Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Inventory
Talend Data Preparation
Talend Pipeline Designer
Content
Administration and Monitoring > Managing connections
Design and Development > Designing Pipelines
Last publication date
2024-03-21

This scenario aims at helping you set up and use connectors in a pipeline. You are advised to adapt it to your environment and use case.

Procedure

  1. Click Connections > Add connection.
  2. In the panel that opens, select the type of connection you want to create.

    Example

    ADLS Gen2
  3. Select your engine in the Engine list.
    Note:
    • It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
    • If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
    • The list of available connection types depends on the engine you have selected.
  4. Select the type of connection you want to create.
    Here, select ADLS Gen2.
  5. Fill in the connection properties to access your Azure Data Lake Storage Gen2 file system as described in Azure Data Lake Storage Gen2 properties, check the connection and click Add dataset.
  6. In the Add a new dataset panel, name your dataset.

    Example

    BKO Taxi On ADLS Gen2
  7. Fill in the required properties to access the file located in your storage account and click View sample to see a preview of your dataset sample.
    In this example, a CSV file containing data about taxi trip costs in the city of Bamako, Mali is retrieved in the talend folder of an Azure file system called talend-fs. You are able to see your file system directories from the Storage Explorer page of your Azure Storage Account.
  8. Do the same to add the Azure Synapse table that will be created when running your pipeline, named taxi_data in this example. Fill in the connection properties as described in Azure Synapse properties.
  9. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  10. Give the pipeline a meaningful name.

    Example

    From ADLS Gen2 to Synapse - trip cost per distance covered
  11. Click ADD SOURCE and select your source dataset, BKO taxi on ADSL Gen2 in the panel that opens.
  12. Click to add processors to the pipeline, for example a Type converter to convert string fields to int or double type fields, a Field selector to filter and rename some records and an Aggregate processor to calculate the cost of a trip according to the distance covered.
  13. (Optional) Click the last processor to preview the processed data.
  14. Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the Azure Blob in which your output data will be loaded.
  15. Give a meaningful name to the Destination; bulk load to Synapse for example.
  16. In the Configuration tab of the destination, select the Action you want to perform on the table (Bulk load) then select the Blob connection to be used. See Azure Blob Storage for more information on Azure Blob Storage configuration.
  17. Click Save to save your configuration.
  18. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  19. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the taxi cost information that was stored on Azure DLS Gen2 has been aggregated per distance covered and the output flow is loaded into the Azure Synapse table that is created when running the pipeline.