Aggregating the extracted information - 6.5

Talend Real-Time Big Data Platform Getting Started Guide

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Profiling data
Design and Development
Installation and Upgrade

Procedure

  1. Double-click tAggregateRow to open its Component view. This component allows you to find out the most popular activity recorded in the received messages.
  2. Click the [...] button next to Edit schema to open the schema editor.
  3. On the output side (right), click the [+] button three times to add three rows and in the Column column, rename them to activity, gender and popularity, respectively.
  4. In the Type column of the popularitypopularity row of the output side, select Double.
  5. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
  6. In the Group by table, add two rows by clicking the [+] button twice and configure these two rows as follows to group the outputted data:
    • Output column: select the columns from the output schema to be used as the conditions to group the outputted data. In this example, they are activity and gender.

    • Input column position: select the columns from the input schema to send data to the output columns you have selected in the Output column column. In this scenario, they are activity and gender.

  7. In the Operations table, add one row by clicking the [+] button once and configure this row as follows to calculate the popularity of each activity:
    • Output column: select the column from the output schema to carry the calculated results. In this scenario, it is popularity.

    • Function: select the function to be used to process the incoming data. In this scenario, select count. It counts the frequency of each activity in the received messages.

    • Input column position: select the column from the input schema to provide the data to be processed. In this scenario, it is activity.

  8. Press F6 to run this Job

Results

Once done, the Run view is opened automatically, where you can check the execution result.

You can read that the activity Drink is the most popular with 3 occurrences for the gender M (Male) and 1 occurrence for the gender F (Female) in the messages.

The Storm topology continues to run, waiting for messages to appear on the Kafka message broker until you kill the Job. In this scenario, because the Kill topology on quitting Talend Job check box is selected, the Storm topology will be stopped and removed from the cluster when this Job is stopped.