Defining Spark Universal connection details in the Spark configuration view - Cloud - 8.0

Talend Data Fabric Studio User Guide

Version
Cloud
8.0
Language
English (United States)
EnrichDitaval
Data Fabric
Product
Talend Data Fabric
Module
Talend Studio
Content
Design and Development

Complete the Spark Universal connection configuration in the Spark configuration tab of the Run view of your Job. This configuration is effective on a per-Job basis.

The information in this section is only for users who have subscribed to Talend Data Fabric or to any Talend product with Big Data but it is not applicable to Talend Open Studio for Big Data users.

Talend Studio allows you to run your Spark Jobs on a Spark Universal distribution in any of the following modes and environments:
Cloudera Data Engineering Studio submit Jobs and collects the execution information of your Job from Cloudera Data Engineering service.

For more information, see Defining Cloudera Data Engineering connection parameters with Spark Universal.

Databricks Studio submits Jobs and collects the execution information of your Job from Databricks. The Spark driver runs either on a transient Databricks cluster or on an interactive Databricks cluster on GCP (technical preview), AWS or Azure.

For more information, see Defining Databricks connection parameters with Spark Universal.

Dataproc Studio submits Jobs and collects the execution information of your Job from Dataproc.

For more information, see Defining Dataproc connection parameters with Spark Universal.

Kubernetes Studio submits Jobs and collects the execution information of your Job from Kubernetes. The Spark driver runs on the cluster managed by Kubernetes and can run independently from your Studio.

For more information, see Defining Kubernetes connection parameters with Spark Universal.

Local Studio builds the Spark environment in itself at runtime to run the Job locally within the Studio. With this mode, each processor of the local machine is used as a Spark worker to perform the computations.

For more information, see Defining Local connection parameters with Spark Universal.

Standalone Studio connects to a Spark-enabled cluster to run the Job from this cluster.

For more information, see Defining Standalone connection parameters with Spark Universal.

Yarn cluster Studio submits Jobs and collects the execution information of your Job from Yarn and ApplicationMaster. The Spark driver runs on the cluster and can run independently from your Studio.

For more information, see Defining Yarn cluster connection parameters with Spark Universal.