Creating the cluster - Cloud

Talend Cloud Management Console for Pipelines User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Administration and Monitoring > Managing projects
Administration and Monitoring > Managing users
Deployment > Deploying > Executing Tasks
Deployment > Scheduling > Scheduling Tasks
EnrichPlatform
Talend Management Console

Procedure

  1. Log in to your Databricks account.
  2. Open the Create Cluster wizard.
  3. Fill in the basic settings. The only field specific to Talend Cloud Pipeline Designer is Databricks Runtime Version, select Runtime: 5.5 LTS (Scala 2.11, Spark 2.4.3).
  4. In the Advanced Options section:
    1. In the Instances tab, set up your instances according to your needs.
    2. In the Spark tab, paste this in the Spark config area:
      spark.executor.extraJavaOptions -Dtalend.component.manager.m2.repository=/dbfs/DBFS_STAGING_DIRECTORY_NAME/connectors -Dtalend.spark.streaming.batch.interval=5000							
      spark.driver.extraJavaOptions -Dtalend.component.manager.m2.repository=/dbfs/DBFS_STAGING_DIRECTORY_NAME/connectors -Dtalend.spark.streaming.batch.interval=5000
      where DBFS_STAGING_DIRECTORY_NAME corresponds to the name of your DBFS staging directory.
      Note: You have to use the same DBFS staging directory when creating your run profile in Talend Cloud Management Console.
    3. In the Tags tab, add this tag to indicate that the cluster is created for Talend Cloud Pipeline Designer:
      Key: TALEND_TPD_CLUSTER_TYPE

      Value: TPD_COMPATIBLE_INTERACTIVE_CLUSTER_1.0

    4. In the Logging tab, add the path to the cluster log storage directory:
      Destination: DBFS

      Cluster Log Path: dbfs:/DBFS_STAGING_DIRECTORY_NAME/cluster_logs

      where DBFS_STAGING_DIRECTORY_NAME corresponds to the name of your DBFS staging directory.
      Note: You have to use the same DBFS staging directory when creating your run profile in Talend Cloud Management Console.
    5. In the Init Scripts tab, add this DBFS initialization script:
      Destination: DBFS

      Init Script Path: dbfs:/DBFS_STAGING_DIRECTORY_NAME/scripts/databricks_spark_2.4.X_patches.sh

      where DBFS_STAGING_DIRECTORY_NAME corresponds to the name of your DBFS staging directory.
      Note:
      • You have to use the same DBFS staging directory when creating your run profile in Talend Cloud Management Console.
      • If you were using versions of Databricks prior to version 5.5 LTS, you need to use new empty staging folder in DBFS as you cannot reuse the staging folder of the previous Databricks version.
    6. Click Create cluster to finalize the creation operation.