Skip to main content Skip to complementary content

Mount the Azure Data Lake Storage Gen2 filesystem to be used to DBFS

Before you begin

  • Ensure that you have grant your application the read-and-write permissions to your ADLS Gen2 filesystem.

Procedure

  1. Download the Databricks CLI and install it as described in this documentation: Databricks Command-Line Interface.
  2. Use this Databricks CLI to create a Databricks-backed secret scope. For example, name this scope to talendsadlsgen2. The command to be used is:

    Example

    databricks secrets create-scope --scope talendadlsgen2 --initial-manage-principal users

    This command grant the access permissions to this secret scope to all users.

  3. Add a secret to this scope using the following command:

    Example

    databricks secrets put --scope talendadlsgen2 --key adlscredentials

    In this command, talendadlsgen2 is the name of the secret scope created in the previous step; adlscredentials is the secret to be created.

  4. Once the command in the previous step is run, an text editor displays automatically. Paste the value of the adlscredentials secret in this editor and save and exit the editor. In this step, this value is the client secret of your ADLS Gen2 storage account.

    Example

    # ----------------------------------------------------------------------
    # Do not edit the above line. Everything below it will be ignored.
    # Please input your secret value above the line. Text will be stored in
    # UTF-8 (MB4) form and any trailing new line will be stripped.
    # Exit without saving will abort writing secret.

    This value must be added above this line.

  5. Repeat this process to add, to this talendadlsgen2 secret scope, the following secrets, separately:
    • adlsclientid: the value of this secret is the application ID of your ADLS Gen2 storage account.
    • adlsendpoint: the value of this secret is the oauth 2.0 token endpoint of your ADLS Gen2 storage account.
  6. On your Azure Databricks portal, create a Databricks cluster from the Azure Databricks Workspace. The version of this cluster must be among those supported by Talend.
  7. Once the cluster is created and running, switch back to the Azure Databricks Workspace and click Create a Blank Notebook.

    Example

  8. Add the following Scala code to this Notebook and replace file-system-name, storage-account-name> and mount-name with their actual values:

    Example

    val configs = Map(
    	"fs.azure.account.auth.type" -> "OAuth",
    	"fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    	"fs.azure.account.oauth2.client.id" -> dbutils.secrets.get(scope = "talendadlsgen2", key = "adlsclientid"),
    	"fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope = "talendadlsgen2", key = "adlscredentials"),
    	"fs.azure.account.oauth2.client.endpoint" -> dbutils.secrets.get(scope = "talendadlsgen2", key = "adlsendpoint")
    	)                             
    // Optionally, you can add <directory-name> to the source URI of your mount point.
    dbutils.fs.mount(
    	source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
    	mountPoint = "/mnt/<mount-name>",
    	extraConfigs = configs)
    					
  9. Optionally, append the following lines to the code added in the previous step:

    Example

    val df = spark.read.text("/mnt/<mount-name>/<file-location>")
    df.show();

    These lines allow you to access files in your ADLS Gen2 filesystem as if they were in DBFS.

Results

If the ADLS Gen2 filesystem to be mounted contains some files, run this Notebook. Then you should see the data stored in the file specified by <file-location> in the lines appended in the last step.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!