How to Use Spark 2.0 with Studio 6.3 - 6.3

How to Use Spark 2.0 with Studio 6.3

EnrichVersion
6.3
task
Design and Development > Designing Jobs > Hadoop distributions
Design and Development > Designing Jobs > Job Frameworks > Spark Batch
Design and Development > Designing Jobs > Job Frameworks > Spark Streaming
EnrichPlatform
Talend Studio

When Talend Studio version 6.3 was released, many Hadoop distributions officially supported by Talend did not provide native support for Spark 2.0. However, if you install Spark 2.0 in your cluster on your own, you can still use it with your Talend Job.

Environment:

  • A subscription-based Talend solutions with Big Data 6.3.
  • A Hadoop cluster officially supported by Talend Studio V6.3.
  • A cluster that does not natively support Spark 2.0
  • Spark 2.0 installed in the cluster

In order to run, a Talend Spark Job requires all related dependencies.

Before Spark 2.0, a consolidated jar file (also known as an assembly) was natively provided by Spark to bundle all required dependencies. In Spark 2.0, this jar file no longer exists. To run your Talend Spark Job on your own with Spark 2.0 installed, you need to set the Spark Job configuration to help the Job find the dependencies.