Spark Universal is a mechanism that allows Talend Studio to be
compatible with every big data distribution for a given Spark version. You choose a
Spark version and upload a Hadoop configuration JAR file that contains all the necessary
information to connect to your cluster. With Spark Universal, you can also easily switch
between the different Spark modes, distributions or environments by changing the Hadoop
configuration JAR file. For more information about switches, see Switching between modes, distributions or environments.
Spark Universal modes and environments compatibility
Talend Studio is compatible with the following modes and environments, depending on the Spark
versions:
Spark 2.4.x | Spark 3.0.x | Spark 3.1.x | Spark 3.2.x | |
Local mode | ||||
Standalone | ||||
Yarn cluster mode | ||||
Databricks | ||||
Dataproc | ||||
Cloudera Data Engineering |
Spark Universal distributions compatibility
Talend Studio is compatible with the following distributions in Yarn
cluster mode, depending on the Spark versions:
For example, if you want to connect to an Amazon EMR 6.2 cluster, you
need to select Spark 3.0.x version and then upload the Hadoop configuration JAR file
that contains all the
Spark 2.4.x |
|
Spark 3.0.x |
|
Spark 3.1.x |
|
*-site.xml
files related to the cluster.