Reading delta data from the filesystem - 7.3

Delta Lake

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Technical components > Delta Lake components
Data Quality and Preparation > Third-party systems > Technical components > Delta Lake components
Design and Development > Third-party systems > Technical components > Delta Lake components
Last publication date
2024-02-21

Configure tDeltaLakeInput to read the different snapshots of the data about US flights so that your Job can then easily calculate the evolution of the flights.

Each snapshot got a version when they were written in the Delta Lake dataset to be used.

Procedure

  1. Configure the storage configuration component to be used to provide the connection information to your filesystem. In this example, it is a tS3Configuration
  2. Double-click the tDeltaLakeInput component labeled flights_latest_version to open its Component view.
  3. Select the Select a storage configuration component check box to reuse the connection information defined in tS3Configuration.
  4. Click Edit schema to open the schema editor. In this editor, define the schema of the input data.
  5. In the Folder/File field, enter the directory where the flight dataset is stored, in the S3 bucket specified in tS3Configuration.
  6. Do the same to configure the other tDeltaLakeInput component but select the Specify time travel version check box and enter 0 in double quotation marks in the displayed Version field, meaning, in this scenario, to read the first version of the data about US flights.
    Without using the time travel feature, tDeltaLakeInput reads the latest snapshot of your data; the time travel feature allows you to specify the snapshot to be read.