Configuring the last Job - 7.0

Big Data Job Examples

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs
Design and Development > Designing Jobs > Hadoop distributions
Design and Development > Designing Jobs > Job Frameworks > Standard
EnrichPlatform
Talend Studio
In this step, we will configure the last Job, F_Read_Results, to read the results data from Hadoop and display them on the standard system console.

Procedure

  1. Double-click the first tHDFSInput component to open its Basic settings view.
  2. Click the Property Type list box and select Repository, and then click the [...] button to open the [Repository Content] dialog box to use a centralized HDFS connection.
  3. Select the HDFS connection defined for connecting to the HDFS system and click OK.

    All the connection details are automatically filled in the respective fields.

  4. Apply the generic schema of ip_count to this component. The schema should contain two columns, host (string, 50 characters) and count (integer, 5 characters),
  5. In the File Name field, enter the path to the result file in HDFS, /user/hdp/weblog/apache_ip_cnt/part-r-00000 in this example.
  6. From the Type list, select the type of the file to read, Text File in this example.
  7. In the Basic settings view of the tLogRow component, select the Table option for better readability.
  8. Configure the other subjob in the same way, but in the second tHDFSInput component:
    1. Apply the generic schema of code_count, or configure the schema of this component manually so that it contains two columns: code (integer, 5 characters) and count (integer, 5 characters).
    2. Fill the File Name field with /user/hdp/weblog/apache_code_cnt/part-r-00000.
  9. Upon completion of the component settings, press Ctrl+S to save your Job configurations.