How to configure the Hadoop connection of a MapReduce Job - 6.3

Talend Data Fabric Studio User Guide

English (United States)
Talend Data Fabric
Talend Studio
Data Quality and Preparation
Design and Development

Before being able to run a Talend MapReduce Job, you need to configure its connection to Hadoop. To do this, proceed as follows:

  1. From the Repository tree view of the Integration perspective of the Studio, double-click the MapReduce Job you have created and want to run so as to open it on the workspace.

  2. Click the Run tab to open its view and click the Hadoop configuration tab.

    In this view, you need to set the parameters to create connection to the Hadoop cluster you need to use. For further explanations of each parameter in this view, see Talend Big Data Getting Started Guide.

    Note that the connection created from this Hadoop configuration view is effective on a per-Job basis, therefore, when you need to run another Job, you have to configure from this view the connection specific for that Job.

    At this moment if you also have finalized the Job design using the components optimized for MapReduce, as explained earlier with regard to creating a MapReduce Job, the Job is ready to run; otherwise, you must finalize the design before running the Job. For further information about the components for MapReduce, see Talend Components Reference Guide.

This image presents a finalized MapReduce Job with its connection to Hadoop, thus ready to run.

Click the Code tab to open its view so as to see the generated MapReduce code.

This image shows a part of the generated code, reflecting the rejects data flow of the Job. You can read that this Job checks the configuration information and generates different classes such as InputFormat, OutputFormat, Mapper and Reducer.

If you select one of the generated classes, for example, tDenormalize_1Reducer.class and press F3, then the code of this class is displayed in a new tab as follows:

From this view, you can read how this Reducer performs its Reduce computation.