Applying a preparation on ADLS Gen2 Delta tables - Cloud - 8.0

Azure Data Lake Store

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Cloud storages > Azure components > Azure Data Lake Storage Gen2 components
Data Quality and Preparation > Third-party systems > Cloud storages > Azure components > Azure Data Lake Storage Gen2 components
Design and Development > Third-party systems > Cloud storages > Azure components > Azure Data Lake Storage Gen2 components
Last publication date
2023-06-07

This scenario retrieves data from an Azure ADLS Gen2 file system, prepares the data, and then displays it.

For more technologies supported by Talend, see Talend components.

This scenario shows how to retrieve a Delta table from an ADLS Gen2 file system, apply a compatible preparation directly in the flow of the Job, and read the resulting data.

The tAzureAdlsGen2Input component allows you to access your Azure storage, and more specifically your Delta tables. By using the tDataprepRun component in the middle of your Job, you can even reuse an existing preparation created in Talend Data Preparation, to transform and clean the data before reading it or outputting it to the destination of your choice.

The following scenario creates a simple Job that:

  • Retrieves customer data from a Databricks Delta table
  • Directly applies a preparation with a compatible schema
  • Reads the data in the output component

In this example, the Delta table contains basic customer information, such as name, age, birthday and phone number amongst other things.

This scenario assumes that a preparation has been created beforehand, on a dataset with the same schema as your input data for the Job. In this case, the existing preparation is called preparation_adlsgen2.

Note: Having the same schema on both ends ensures a coherent result, but the Job will still run even if the schema is different.

This simple preparation puts last names in upper case, and changes the date format.