Scenario: Retrieving the different ages and lowest age data - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario displays the number of occurrences of different ages and the lowest age within a group of customers. In this scenario, the customer data is entered manually.

You will see two ways to handle this data with the tMemorizeRows:

  • Inside the same subjob (with the tJavaFlex)

  • Outside the tMemorizeRows subjob (with the tJava)

This Job uses five components:

  • tFixedFlowInput: it contains rows of customer data such as IDs, names and ages of the customers.

  • tSortRow: it sorts the rows according to the age data.

  • tMemorizeRows: it temporarily memorizes a specific number of incoming data rows at any given time and indexes the memorized data rows.

  • tJavaFlex: it compares the age values of the data memorized by the preceding component, counts the occurrences of different ages and displays these ages in the Run view.

  • tJava: it displays the number of occurrences of different ages and the lowest age.

To replicate this scenario, proceed as follows:

Dropping and linking the components

  1. Drop a tFixedFlowInput, a tSortRow, a tMemorizeRows, a tJavaFlex and a tJava component by typing their names in the design workspace or dropping them from the Palette.

  2. Connect the tFixedFlowInput component to the tSortRow component using a Row > Main connection.

  3. Do the same to link the tSortRow component to the tMemorizeRows component and the tMemorizeRows component to the tJavaFlex component.

  4. Connect the tFixedFlowInput component to the tJava component using the Trigger > OnSubjobOk connection.

Configuring the components

Configuring the tFixedFlowInput component

  1. Double-click the tFixedFlowInput component to open its Basic settings view on the Component tab.

  2. Click the [...] button next to Edit schema to open the [Schema] dialog box and define the data structure of the input data.

  3. In this editor, click the [+] button three times to add three columns and name them id, name and age.

  4. In the Type column, select Integer for id and age.

  5. Click OK to close the editor, then click Yes to validate these changes and accept the propagation prompted by the dialog box that pops up.

  6. Select Use Inline Content (delimited file) in the Mode area.

    In the Content field, enter the following customer data:

    1;Judy;27
    2;Lily;45
    3;Peter;59
    4;John;30
    5;Teddy;45

Configuring the tSortRow component

  1. Double-click the tSortRow to open its Basic settings view on the Component tab.

  2. In the Criteria table, click the [+] button to add one row.

  3. In the Schema column column, select the data column on which the sorting operation is based. In this example, select age as it is the ages that should be compared and counted.

  4. In the sort num or alpha? column, select the type of sorting operation to perform. In this example, select num, which means numerical, as age contains integer data.

  5. In the Order asc or desc? column, select desc to display data on the console in descending order.

Configuring the tMemorizeRows component

  1. Double-click the tMemorizeRows component to open its Basic settings view on the Component tab.

  2. In the Row count to memorize field, type in the maximum number of rows to be memorized at any given time. As in this example you need to compare ages of two customers for each time, enter 2. This component memorizes two rows at maximum at any given moment and always indexes the new incoming row as 0 and the previous incoming row as 1.

  3. In the Memorize column of the Columns to memorize table, select the check box(es) to determine the column(s) to be memorized. In this example, select the check box corresponding to age.

Configuring the tJavaFlex and tJava components

  1. Double-click the tJavaFlex component to open its Basic settings view on the Component tab.

  2. In the Start code area, enter the Java code that will be called during the initialization phase. In this example, type in int count=0; in order to declare a variable count and assign the value 0 to it.

  3. In the Main code area, enter the Java code to be applied to each row in the data flow. In this scenario, type in:

    if(!age_tMemorizeRows_1[0].equals(age_tMemorizeRows_1[1]))
    {
    count++;
    }
    System.out.println(age_tMemorizeRows_1[0]);

    This code compares two ages memorized by the tMemorizeRows component each time and counts one change every time the ages are found to be different. This code then displays the age that has been indexed as 0 by the tMemorizeRows component. When the tJavaFlex component is in the same flow of the tMemorizeRows component, the variable format is ColumnName_ComponentName[index].

  4. In the End code area, enter the Java code that will be called during the closing phase. In this example, type in globalMap.put("number", count); to initialize the global variable number with the value of the count variable.

  5. Double-click the tJava component to open its Basic settings view on the Component tab.

  6. In the Code area, enter the flollowing code to display the number of occurrences of different ages and the lowest age within the customers on the console:

    System.out.println("Different ages:
    "+globalMap.get("number"));
    
    System.out.println("Lowest age: " + ((Integer[])globalMap.get("tMemorizeRows_1_age"))[0]);

    The method globalMap.get() is used by the tJava to retrieve the array values. Note that here, the tJava is used outside the subjob tMemorizeRows so the variable format should be ComponentName_ColumnName, which is different from the variable format used by a component placed in the same flow.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6, or click Run on the Run console to execute the Job.

In the console, you can read that there are four different ages and that the lowest age is 27 within the group of customers.