Configuring the connection to the Azure Data Lake Storage service to be used by Spark - 7.1

Databricks

author
Talend Documentation Team
EnrichVersion
Cloud
7.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs > Hadoop distributions > Databricks
Design and Development > Designing Jobs > Serverless > Databricks
EnrichPlatform
Talend Studio

Procedure

  1. Double-click tAzureFSConfiguration to open its Component view.
    Spark uses this component to connect to the Azure Data Lake Storage system to which your Job writes the actual business data.
  2. From the Azure FileSystem drop-down list, select Azure Datalake Storage to use Data Lake Storage as the target system to be used.
  3. In the Datalake storage account field, enter the name of the Data Lake Storage account you need to access.
    Ensure that the administrator of the system has granted your Azure account the appropriate access permissions to this Data Lake Storage account.
  4. In the Client ID and the Client key fields, enter, respectively, the authentication ID and the authentication key generated upon the registration of the application that the current Job you are developing uses to access Azure Data Lake Storage.

    Ensure that the application to be used has appropriate permissions to access Azure Data Lake. You can check this on the Required permissions view of this application on Azure. For further information, see Azure documentation Assign the Azure AD application to the Azure Data Lake Storage account file or folder.

    This application must be the one to which you assigned permissions to access your Azure Data Lake Storage in the previous step.

  5. In the Token endpoint field, copy-paste the OAuth 2.0 token endpoint that you can obtain from the Endpoints list accessible on the App registrations page on your Azure portal.