Create the fourth Job - 7.0

Big Data Job Examples

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Open Studio for Big Data
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs
Design and Development > Designing Jobs > Hadoop distributions
Design and Development > Designing Jobs > Job Frameworks > Standard
EnrichPlatform
Talend Studio
Follow these steps to create the fourth Job, which will analyze the uploaded log file to get the code occurrences in successful calls to the website.

Procedure

  1. Create a new Job and name it D_Pig_Count_Codes to identify its role and execution order among the example Jobs.
  2. Drop the following components from the Palette to the design workspace:
    • a tPigLoad, to load the data to be analyzed,

    • a tPigFilterRow, to remove records with the '404' error from the input flow,

    • a tPigFilterColumns, to select the columns you want to include in the result data,

    • a tPigAggregate, to count the number of visits to the website,

    • a tPigSort, to sort the result data, and

    • a tPigStoreResult, to save the result to HDFS.

  3. Connect these components using Row > Pig Combine connections to form a Pig chain, and label them to better identify their functionality.