Creating a Databricks cluster compatible with Talend Cloud Pipeline Designer - Cloud

Talend Cloud Management Console for Pipelines User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Administration and Monitoring > Managing projects
Administration and Monitoring > Managing users
Deployment > Deploying > Executing Tasks
Deployment > Scheduling > Scheduling Tasks
EnrichPlatform
Talend Management Console
This document describes how to create a Databricks interactive cluster that can be used with Talend Cloud Pipeline Designer.

Interactive clusters come with some limitations, including the fact that the master/workers are started at cluster creation and are shut down at cluster shutdown. This means that the initial cluster configuration (Spark configuration) cannot be changed at runtime.

  • Interactive clusters must be created following this detailed procedure, failure to do so can prevent you from submitting pipelines to the created cluster. The procedure should be followed carefully as any missing step can lead to unexpected errors at runtime. An error message will inform you if your cluster is not compatible with Talend Cloud Pipeline Designer.
  • Interactive clusters are tied to a single Remote Engine Gen2 version. This requirement comes from the cluster's master and workers limitation mentioned above. An error message will inform you if your cluster is not compatible with your Remote Engine Gen2 version.
  • Interactive clusters are tied to a single staging directory, the one configured at cluster creation. Using a different staging directory (through run profiles) is not taken into consideration. This requirement comes from the cluster's master and workers limitation mentioned above. A warning message in the logs will inform you if your staging directory (coming from your run profile) is different from the one used during cluster creation.