Running a Job with Spark Universal - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-04-16
Available in...

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

Spark Universal is a mechanism that allows Talend Studio to be compatible with every big data distribution for a given Spark version. You choose a Spark version and upload a Hadoop configuration JAR file that contains all the necessary information to connect to your cluster.

Spark Universal modes and environments support

Talend Studio supports the following modes and environments, depending on the Spark versions:
Mode or environment Spark 2.4.x Spark 3.0.x Spark 3.1.x Spark 3.2.x Spark 3.3.x Spark 3.4.x
Local mode Supported Supported Supported Supported Supported Supported
Standalone Not supported Not supported Not supported Supported Not supported Supported
Yarn cluster mode Supported Supported Supported Supported Supported Not supported
Databricks Not supported Not supported Supported Supported Supported Supported
Dataproc Not supported Not supported Supported Supported Supported Not supported
Cloudera Data Engineering Not supported Not supported Supported Supported Not supported Not supported
Kubernetes Not supported Not supported Supported Not supported Not supported Not supported
Spark-submit scripts Not supported Not supported Not supported Not supported Supported Not supported
Synapse Not supported Not supported Not supported Supported Supported Not supported
HDInsight Not supported Not supported Supported Not supported Not supported Not supported
EMR Serverless Not supported Not supported Not supported Supported Supported Not supported
Note:
  • Azure Synapse Analytics with Spark Universal 3.2.x and 3.3.x is only supported in Spark Batch Jobs.
  • Spark-submit script with Spark Universal 3.3.x is only supported in Spark Batch Jobs.

Spark Universal distributions support

Talend Studio supports the following distributions in Yarn cluster mode, depending on the Spark versions:
Spark version Supported distributions in Yarn cluster mode
Spark 2.4.x
  • Amazon EMR 5.2.x and above
  • CDH 6.x
  • HDP 3.x
Spark 3.0.x
  • Amazon EMR 6.2
  • CDP 7.1
Spark 3.1.x
  • Amazon EMR 6.3.x, 6.4.x and 6.5.x
Spark 3.2.x
  • Amazon EMR 6.6.0 and 6.7.0
Spark 3.3.x
  • Amazon EMR 6.8.0, 6.9.0 and 6.10.0
  • CDP Private Cloud Base 7.1.8 and 7.1.9
For example, if you want to connect to an Amazon EMR 6.2 cluster, you need to select Spark 3.0.x version and then upload the Hadoop configuration JAR file that contains all the *-site.xml files related to the cluster.

This list of distribution is not exhaustive, you can use Yarn cluster with any other distribution if the Spark version matches, but keep in mind that they have not been officially tested by Talend and thus not guaranteed to work.