Skip to main content Skip to complementary content

Reading data from a HDFS connection on Spark

Using predefined HDFS metadata, you can read data from a HDFS filesystem on Spark.

Before you begin

Procedure

  1. In the Designer, add an input component.

    Example

    Add a tFileInputDelimited component.
  2. Double-click the component.
    Your component is configured with the tHDFSConfiguration component information, under Storage.
  3. Click the […] button next to Edit schema.
  4. Click the plus button to add a data column.

    Example

    1. CustomerID
    2. FirstName
    3. LastName
  5. Select the column Types.

    Example

    For CustomerID, select the Integer Type.
  6. Click OK.
  7. In the File Name field, enter the file path and name of your choice.

Results

The tFileInputDelimited component is now configured to read data from HDFS on Spark.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!