Running a Job with Spark Universal - Cloud - 8.0

Talend Data Fabric Studio User Guide

Version
Cloud
8.0
Language
English (United States)
EnrichDitaval
Data Fabric
Product
Talend Data Fabric
Module
Talend Studio
Content
Design and Development
Spark Universal is a mechanism that allows Talend Studio to be compatible with every big data distribution for a given Spark version. You choose a Spark version and upload a Hadoop configuration JAR file that contains all the necessary information to connect to your cluster. With Spark Universal, you can also easily switch between the different Spark modes, distributions or environments by changing the Hadoop configuration JAR file. For more information about switches, see Switching between modes, distributions or environments.

Spark Universal modes and environments compatibility

Talend Studio is compatible with the following modes and environments, depending on the Spark versions:
  Spark 2.4.x Spark 3.0.x Spark 3.1.x Spark 3.2.x
Local mode
Standalone
Yarn cluster mode
Databricks
Dataproc
Cloudera Data Engineering

Spark Universal distributions compatibility

Talend Studio is compatible with the following distributions in Yarn cluster mode, depending on the Spark versions:
Spark 2.4.x
  • Amazon EMR 5.2.x and above
  • CDH 6.x
  • HDP 3.x
Spark 3.0.x
  • Amazon EMR 6.2
  • CDP 7.1
Spark 3.1.x
  • Amazon EMR 6.3.x, 6.4.x and 6.5.x
For example, if you want to connect to an Amazon EMR 6.2 cluster, you need to select Spark 3.0.x version and then upload the Hadoop configuration JAR file that contains all the *-site.xml files related to the cluster.