Skip to main content Skip to complementary content
Close announcements banner

Reading delta data from the filesystem

Configure tDeltaLakeInput to read the different snapshots of the data about US flights so that your Job can then easily calculate the evolution of the flights.

Each snapshot got a version when they were written in the Delta Lake dataset to be used.

Procedure

  1. Configure the storage configuration component to be used to provide the connection information to your filesystem. In this example, it is a tS3Configuration
  2. Double-click the tDeltaLakeInput component labeled flights_latest_version to open its Component view.
  3. Select the Select a storage configuration component check box to reuse the connection information defined in tS3Configuration.
  4. Click Edit schema to open the schema editor. In this editor, define the schema of the input data.
  5. In the Folder/File field, enter the directory where the flight dataset is stored, in the S3 bucket specified in tS3Configuration.
  6. Do the same to configure the other tDeltaLakeInput component but select the Specify time travel version check box and enter 0 in double quotation marks in the displayed Version field, meaning, in this scenario, to read the first version of the data about US flights.
    Without using the time travel feature, tDeltaLakeInput reads the latest snapshot of your data; the time travel feature allows you to specify the snapshot to be read.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!