Create the fourth Job - 7.2

Big Data Job Examples

Version
7.2
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs
Design and Development > Designing Jobs > Hadoop distributions
Design and Development > Designing Jobs > Job Frameworks > Standard
Follow these steps to create the fourth Job, which will analyze the uploaded log file to get the code occurrences in successful calls to the website.

Procedure

  1. Create a new Job and name it D_Pig_Count_Codes to identify its role and execution order among the example Jobs.
  2. Drop the following components from the Palette to the design workspace:
    • a tPigLoad, to load the data to be analyzed,

    • a tPigFilterRow, to remove records with the '404' error from the input flow,

    • a tPigFilterColumns, to select the columns you want to include in the result data,

    • a tPigAggregate, to count the number of visits to the website,

    • a tPigSort, to sort the result data, and

    • a tPigStoreResult, to save the result to HDFS.

  3. Connect these components using Row > Pig Combine connections to form a Pig chain, and label them to better identify their functionality.