In this step, we will configure the last Job, F_Read_Results, to read the results data from Hadoop and display them on the standard system console.
Double-click the first tHDFSInput
component to open its Basic settings
- Click the Property Type list box and select Repository, and then click the [...] button to open the [Repository Content] dialog box to use a centralized HDFS connection.
Select the HDFS connection defined for connecting to the HDFS system and click
All the connection details are automatically filled in the respective fields.
- Apply the generic schema of ip_count to this component. The schema should contain two columns, host (string, 50 characters) and count (integer, 5 characters),
- In the File Name field, enter the path to the result file in HDFS, /user/hdp/weblog/apache_ip_cnt/part-r-00000 in this example.
- From the Type list, select the type of the file to read, Text File in this example.
- In the Basic settings view of the tLogRow component, select the Table option for better readability.
Configure the other subjob in the same way, but in the second
- Apply the generic schema of code_count, or configure the schema of this component manually so that it contains two columns: code (integer, 5 characters) and count (integer, 5 characters).
- Fill the File Name field with /user/hdp/weblog/apache_code_cnt/part-r-00000.
- Upon completion of the component settings, press Ctrl+S to save your Job configurations.