Defining Spark Universal connection details in the Spark configuration view - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data

Big Data Platform

Cloud Big Data

Cloud Big Data Platform

Cloud Data Fabric

Data Fabric

Real-Time Big Data Platform

Complete the Spark Universal connection configuration in the Spark configuration tab of the Run view of your Job. This configuration is effective on a per-Job basis.

Talend Studio allows you to run your Spark Jobs on a Spark Universal distribution in any of the following modes and environments:
Mode or environment Description
Cloudera Data Engineering Talend Studio submit Jobs and collects the execution information of your Job from Cloudera Data Engineering service.

For more information, see Defining Cloudera Data Engineering connection parameters with Spark Universal.

Databricks Talend Studio submits Jobs and collects the execution information of your Job from Databricks. The Spark driver runs either on a job Databricks cluster or on an all-purpose Databricks cluster on GCP, AWS, or Azure.

For more information, see Defining Databricks connection parameters with Spark Universal.

Dataproc Talend Studio submits Jobs and collects the execution information of your Job from Dataproc.

For more information, see Defining Dataproc connection parameters with Spark Universal.

Kubernetes Talend Studio submits Jobs and collects the execution information of your Job from Kubernetes. The Spark driver runs on the cluster managed by Kubernetes and can run independently from Talend Studio.

For more information, see Defining Kubernetes connection parameters with Spark Universal.

Local Talend Studio builds the Spark environment in itself at runtime to run the Job locally in Talend Studio. With this mode, each processor of the local machine is used as a Spark worker to perform the computations.

For more information, see Defining Local connection parameters with Spark Universal.

Spark-submit scripts Talend Studio submits Jobs and collects the execution information of your Job from Yarn and ApplicationMaster of your cluster, typically an HPE Data fabric cluster. The Spark driver runs on the cluster and can run independently from Talend Studio.

For more information, see Defining Spark-submit scripts connection parameters with Spark Universal

Standalone Talend Studio connects to a Spark-enabled cluster to run the Job from this cluster.

For more information, see Defining Standalone connection parameters with Spark Universal.

Synapse Talend Studio submits Jobs and collects the execution information of your Job from Azure Synapse Analytics.

For more information, see Defining the Azure Synapse Analytics connection parameters with Spark Universal.

Yarn cluster Talend Studio submits Jobs and collects the execution information of your Job from Yarn and ApplicationMaster. The Spark driver runs on the cluster and can run independently from Talend Studio.

For more information, see Defining Yarn cluster connection parameters with Spark Universal.