Skip to main content Skip to complementary content

Creating a Big Data Batch Job to use Spark or YARN

For Big Data processing, Talend Studio allows you to create Batch Jobs and Streaming Jobs running on Spark or MapReduce.

Before you begin

Select the Integration perspective (Window > Perspective > Integration).

Procedure

  1. In Repository, right-click Job Designs.
    1. Click Create Big Data Batch Job.
  2. In the Name field, enter a name.

    Example

    ReadHDFS_Spark_or_YARN
  3. Select a Framework.
    • Spark
    • MapReduce (deprecated)
  4. Optional: In the Purpose field, enter a purpose.

    Example

    Read and sort customer data
  5. Optional: In the Description field, enter a description.

    Example

    Read and sort customer data stored in HDFS from a Big Data Batch Job running on Spark or YARN
    Information noteTip: Enter a Purpose and Description to stay organized.
  6. Click Finish.

Results

The Designer opens an empty Job.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!