Convert a Map Reduce Job to a Spark Job - 7.1

Talend Data Fabric
Talend MDM Platform
Talend Studio
Design and Development > Designing Jobs > Job Frameworks > MapReduce
Design and Development > Designing Jobs > Job Frameworks > Spark Batch

Convert a Map Reduce Job to a Spark Job

This article shows how to convert a Talend Big Data Batch Job that uses the MapReduce framework, to a Talend Big Data Batch Job that uses the Spark framework.

This example uses Talend Big Data Real Time Platform 6.1

The Talend Studio allows you to convert Jobs from one framework to another. Because Spark allows faster in-memory processing, we will convert a MapReduce Job to a Spark Job.

In this example, it is assumed that a Big Data Batch using the MapReduce framework already exists in your Repository and that you can run it successfully:
  1. In the Repository, right-click your Job and click Duplicate .
  2. Name your new Job. Then, in the Job Type , keep the Big Data Batch option and in the Framework list, select Spark :

    Note : Using the same procedure, you can duplicate your Job as a Standard, Big Data Batch or Big Data Streaming job. Next, depending on the Job type, you will be able to select the framework of your choice.

  3. Click OK . Your Job will appear in the Repository. Double-click the Job to open it in the Designer:
  4. A tHDFSConfiguration has been automatically added. Note that Spark does not depend on a particular file system. The file system used for storage must be defined using a specific component such as tHDFSConfiguration or tS3Configuration.
  5. To find the cluster connection information metadata, double-click the tHDFSConfiguration component.
  6. For Big Data Batch – MapReduce Jobs, the connection to the cluster is configured in the Run View. In the Run View, click the Spark Configuration tab. You will see the repository cluster connection information:
  7. Run your Job and follow the execution in the Designer or in the Console, in the same way as for Big Data – MapReduce Jobs:


When converting a Job from one type to another, or from one framework to another, before running the Job, make sure that all components have been loaded successfully. Note that the Palette and the list of available components change, depending on the Job type and the framework used.