Skip to main content Skip to complementary content

Reading data from HDFS using metadata

Using the tHDFSInput component, you can read data from HDFS.

Before you begin

Procedure

  1. In the Repository, expand Metadata > Hadoop Cluster, then expand the Hadoop cluster metadata of your choice.
    1. Drag-and-drop the HDFS metadata onto the Designer.
    2. Select a tHDFSInput component.
  2. Double-click the tHDFSInput component.

    The component is already configured with the predefined HDFS metadata connection information.

  3. In the File Name field, enter the file path and name of your choice.
  4. Click the […] button next to Edit schema.
  5. Click the plus button to add a Column.
    1. In the Column field, enter a name.

      Example

      1. CustomerID
      2. FirstName
      3. LastName
    2. Select the column Types.

      Example

      1. For CustomerID, select the Integer Type.
      2. For FirstName and LastName, select the String Type.
    3. Click OK.
  6. Right-click the tRowGenerator component.
    1. Select Trigger > On Subjob OK.
    2. Click on the tHDFSInput component to link the two.
  7. Add a tSortRow component.
  8. Right-click the tHDFSInput component.
    1. Select Row > Main
    2. Click on the tSortRow component to link the two.
  9. Double-click the tSortRow component.
    1. Click Sync columns.
      The tSortRow component inherits the schema from the tHDFSInput component.
  10. Click the plus button.
    The first column of your tHDFSInput component schema appears.
  11. Add a tLogRow component.
  12. Right-click the tSortRow component.
    1. Select Row > Main
    2. Click on the tLogRow component to link the two.
      This is what your Designer should look like.
  13. Double-click the tLogRow component.
    1. Select Table (print values in cells of a table).
  14. In the Run view, click Run.

Results

Your input component (such as the tRowGenerator component) provides data to the tHDFSOutput component, which writes it to your HDFS system. When this operation is complete, the tHDFSInput component reads the data, provides it to the tSortRow component, which sorts it, and the tLogRow component displays the HDFS sorted data.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!