About Databricks clusters - Cloud - 8.0

Databricks

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs > Hadoop distributions > Databricks
Design and Development > Designing Jobs > Serverless > Databricks
Last publication date
2024-02-20

The information in this section is only for users of File or Big Data. It is also only for users who run their Spark Jobs on Databricks distributions, both on Azure and AWS.

Databricks clusters are a set of computation resources and configurations on which you can run your Spark Streaming and Spark Batch Jobs. In Talend Studio you can either run your Spark Job on an all-purpose cluster or on a Job cluster.
Note: By default, Spark Jobs run on an all-purpose cluster. You can manage this in the Spark configuration tab in the Run view of your Spark Job. For more information, see Defining the Azure Databricks connection parameters for Spark Jobs.

When you run a Job on an interactive cluster in Talend Studio, you can basically run any workload. Interactive clusters are created for an undetermined duration, but you can manually terminate and restart them if needed. Multiple users can share such clusters to do collaborative and interactive analytics.

When you run a Job on a Job cluster in Talend Studio, you process the Job faster and the cluster automatically shuts down and when processing is finished for a lower cost of usage. Job clusters are created according to your Spark configuration and you cannot restart them once shut down.