Creating a Big Data Batch Job to use Spark or YARN

For Big Data processing, Talend Studio allows you to create Batch Jobs and Streaming Jobs running on Spark or MapReduce.

Before you begin

Select the Integration perspective (Window > Perspective > Integration).

In Repository, right-click Job Designs.
1. Click Create Big Data Batch Job.
In the Name field, enter a name.
Example
ReadHDFS_Spark_or_YARN
Select a Framework.
- Spark
- MapReduce (deprecated)
Optional: In the Purpose field, enter a purpose.
Example
Read and sort customer data
Optional: In the Description field, enter a description.
Example
Read and sort customer data stored in HDFS from a Big Data Batch Job running on Spark or YARN

Information noteTip: Enter a Purpose and Description to stay organized.
Click Finish.

The Designer opens an empty Job.

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!