Copying the engine dependencies to Databricks - Cloud

Talend Cloud Management Console for Pipelines User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Administration and Monitoring > Managing projects
Administration and Monitoring > Managing users
Deployment > Deploying > Executing Tasks
Deployment > Scheduling > Scheduling Tasks
EnrichPlatform
Talend Management Console

Before you begin

  • A Remote Engine Gen2 is installed on your local network or on your Virtual Private Cloud.
  • The Databricks command-line interface (CLI) is installed.
    Tip: If the databricks command cannot be found, look in the .local/bin/databricks folder.

Procedure

  1. Copy these files from the livy container to the host directory:
    docker cp <livy_container_name>:/opt/talend/connectors <hostDirectory>
    docker cp <livy_container_name>:/opt/datastreams-deps <hostDirectory>

    where <livy_container_name> needs to be replaced with the name of your livy container and <hostDirectory> with the name of your host directory.

  2. Copy these files from the host directory to Databricks:
    databricks fs (or alias dbfs) cp -r <hostDirectory>/connectors dbfs:/tpd-staging/connectors
    databricks fs (or alias dbfs) cp -r <hostDirectory>/datastreams-deps dbfs:/tpd-staging/datastreams-deps

    where <hostDirectory> needs to be replaced with the name of your host directory.

  3. Generate a state file:
    find <hostDirectory>/connectors/ -type f | sed 's/connectors\///g' | awk '{print "connectors;" $0}' > ./.state
               
    find <hostDirectory>/datastreams-deps/ -type f | egrep -v '.*.xml' | sed 's/datastreams-deps\///g' | awk '{print "datastreams-deps;" $0}' >> ./.state

    where <hostDirectory> needs to be replaced with the name of your host directory.

  4. Copy the state file to the Databricks staging directory (DBFS) :
    databricks fs cp ./.state dbfs:/tpd-staging/
    Note: Databricks has already aliased databricks fs to dbfs; databricks fs and dbfs are equivalent.