About Databricks clusters

The information in this section is only for users of File or of any Big Data. It is also only for users who run their Spark Jobs on Databricks distributions, both on Azure and AWS.

Databricks clusters are a set of computation resources and configurations on which you can run your Spark Streaming and Spark Batch Jobs. In Talend Studio you can either run your Spark Job on an all-purpose cluster or on a job cluster.

Note: By default, Spark Jobs run on an all-purpose cluster. You can manage this in the Spark configuration tab in the Run view of your Spark Job. For more information, see Defining the Azure Databricks connection parameters for Spark Jobs.

When you run a Job on an all-purpose cluster in Talend Studio, you can basically run any workload. Interactive clusters are created for an undetermined duration, but you can manually terminate and restart them if needed. Multiple users can share such clusters to do collaborative and interactive analytics.

When you run a Job on a job cluster in Talend Studio, you process the Job faster and the cluster automatically shuts down and when processing is finished for a lower cost of usage. Job clusters are created according to your Spark configuration and you cannot restart them once shut down.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here