Reading data from HDFS using metadata - 8.0

First steps using Big Data in Talend Studio

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs > Hadoop distributions
Last publication date
2024-02-06

Using the tHDFSInput component, you can read data from HDFS.

Before you begin

Procedure

  1. In the Repository, expand Metadata > Hadoop Cluster, then expand the Hadoop cluster metadata of your choice.
    1. Drag-and-drop the HDFS metadata onto the Designer.
    2. Select a tHDFSInput component.
  2. Double-click the tHDFSInput component.

    The component is already configured with the predefined HDFS metadata connection information.

  3. In the File Name field, enter the file path and name of your choice.
  4. Click the […] button next to Edit schema.
  5. Click the plus button to add a Column.
    1. In the Column field, enter a name.

      Example

      1. CustomerID
      2. FirstName
      3. LastName
    2. Select the column Types.

      Example

      1. For CustomerID, select the Integer Type.
      2. For FirstName and LastName, select the String Type.
    3. Click OK.
  6. Right-click the tRowGenerator component.
    1. Select Trigger > On Subjob OK.
    2. Click on the tHDFSInput component to link the two.
  7. Add a tSortRow component.
  8. Right-click the tHDFSInput component.
    1. Select Row > Main
    2. Click on the tSortRow component to link the two.
  9. Double-click the tSortRow component.
    1. Click Sync columns.
      The tSortRow component inherits the schema from the tHDFSInput component.
  10. Click the plus button.
    The first column of your tHDFSInput component schema appears.
  11. Add a tLogRow component.
  12. Right-click the tSortRow component.
    1. Select Row > Main
    2. Click on the tLogRow component to link the two.
      This is what your Designer should look like.
  13. Double-click the tLogRow component.
    1. Select Table (print values in cells of a table).
  14. In the Run view, click Run.

Results

Your input component (such as the tRowGenerator component) provides data to the tHDFSOutput component, which writes it to your HDFS system. When this operation is complete, the tHDFSInput component reads the data, provides it to the tSortRow component, which sorts it, and the tLogRow component displays the HDFS sorted data.