Discovering the scenario - 6.3

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.3
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In this example, certain Talend Big Data components are used to leverage the advantage of the Hadoop open source platform for handling big data. In this scenario we use six Jobs:

  • The first Job sets up an HCatalog database, table and partition in HDFS

  • The second Job uploads the access log file to be analyzed to the HDFS file system.

  • The third Job connects to the HCatalog database and displays the content of the uploaded file on the console.

  • The fourth Job parses the uploaded access log file, including removing any records with a "404" error, counting the code occurrences in successful service calls to the website, sorting the result data and saving it in the HDFS file system.

  • The fifth Jobs parse the uploaded access log file, including removing any records with a "404" error, counting the IP address occurrences in successful service calls to the website, sorting the result data and saving it in the HDFS file system.

  • The last Job reads the result data from HDFS and displays the IP addresses with successful service calls and their number of visits to the website on the standard system console.