Configuring the Hadoop connection of a MapReduce Job - 7.1

Talend Real-time Big Data Platform Studio User Guide

English (United States)
Talend Real-Time Big Data Platform
Talend Studio
Design and Development

Before being able to run a Talend MapReduce Job, configure its connection to Hadoop.


  1. From the Repository tree view of the Integration perspective of the Studio, double-click the MapReduce Job you have created and want to run so as to open it on the workspace.
  2. Click the Run tab to open its view and click the Hadoop configuration tab.


    In this view, you need to set the parameters to create connection to the Hadoop cluster you need to use.

    For further explanations of each parameter in this view, see Setting up Hadoop connection manually.

    Note that the connection created from this Hadoop configuration view is effective on a per-Job basis, therefore, when you need to run another Job, you have to configure from this view the connection specific for that Job.

    At this moment if you also have finalized the Job design using the components optimized for MapReduce, as explained earlier with regard to creating a MapReduce Job, the Job is ready to run; otherwise, you must finalize the design before running the Job.


This image presents a finalized MapReduce Job with its connection to Hadoop, thus ready to run.

Click the Code tab to open its view so as to see the generated MapReduce code.

This image shows a part of the generated code, reflecting the rejects data flow of the Job. You can read that this Job checks the configuration information and generates different classes such as InputFormat, OutputFormat, Mapper and Reducer.

If you select one of the generated classes, for example, tDenormalize_1Reducer.class and press F3, then the code of this class is displayed in a new tab as follows:

From this view, you can read how this Reducer performs its Reduce computation.