Creating the MapReduce program - 7.1

Java custom code for Map Reduce

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Custom code components (Integration) > Java custom code component for Map Reduce
Data Quality and Preparation > Third-party systems > Custom code components (Integration) > Java custom code component for Map Reduce
Design and Development > Third-party systems > Custom code components (Integration) > Java custom code component for Map Reduce
EnrichPlatform
Talend Studio

Procedure

  1. Double-click tJavaMR to open its Component view.
  2. Under the mrKeyStruct table, click the button once to add one row.
  3. Rename that row to word_mr. This is the key part of the key/value pair to be used by the Map/Reduce program being created. In the map method, you need to write mrKey.word_mr to represent the keys to be outputted to a reducer.
  4. Under the mrValueStruct table, click the button once to add one row.
  5. Rename that row to count_mr. This is the value part of the above-mentioned key/value pair. In the map method, you need to write mrValue.count_mr to represent the values to be outputted to a reducer.
  6. Click the button next to Edit schema to open the schema editor.
  7. On the side of the schema of tJavaMR, click the button to add two columns and name them to word_output and count_output, respectively. This defines the structure of the data to be outputted.
  8. In the Type column, select Integer for count_output.
  9. In the Map code editing field, edit the body of the map method. In this example, the code is as follows:
    
                      String line = value.record;
    java.util.StringTokenizer tokenizer = new java.util.StringTokenizer(line);
    while(tokenizer.hasMoreTokens()) {
       mrKey.word_mr = tokenizer.nextToken().toUpperCase();
       mrValue.count_mr = 1;
       output.collect(mrKey, mrValue);
    }
                   
    This method is used to split the input data into words, change each word to upper case and create and output key/value pairs such as (HELLO, 1) and (WORLD, 1) to the reducer.
    Note that at runtime, these pairs are automatically shuffled and sorted to take the form of (key, list of values) before being process by the reduce method.
  10. In the Reduce code editing field, edit the body of the reduce method. In this example, the code is as follows:
    
                      int count = 0;
    while(values.hasNext()){
      mrValueStruct value = values.next();
      count += value.count_mr; 
    }    
    outputRow.word_output = key.word_mr;
    outputRow.count_output = count;
    output.collect(NULL, outputRow);
                   
    This reduce method is used to make the sum of the values of the list in each (key, list of values) pair and map the results to the columns of the output schema.