Adding S3 specific properties to access the S3 system from Databricks - Cloud - 8.0

Databricks

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs > Hadoop distributions > Databricks
Design and Development > Designing Jobs > Serverless > Databricks
Last publication date
2024-02-20
Add the S3 specific properties to the Spark configuration of your Databricks cluster on AWS.

Before you begin

  • Ensure that your Spark cluster in Databricks has been properly created and is running and its version is 3.5 LTS. For further information, see Create Databricks workspace from Databricks documentation.
  • You have an AWS account.
  • The S3 bucket to be used has been properly created and you have the appropriate permissions to access it.
  • When you are using a Machine Learning component or tMatchPredict, you have set the Databricks Runtime Version setting to X.X LTS ML.

Procedure

  1. On the Configuration tab of your Databricks cluster page, scroll down to the Spark tab at the bottom of the page.

    Example

  2. Click Edit to make the fields on this page editable.
  3. In this Spark tab, enter the Spark properties regarding the credentials to be used to access your S3 system.
    • S3N
      spark.hadoop.fs.s3n.awsAccessKeyId <your_access_key>
      spark.hadoop.fs.s3n.access.key <your_access_key>
      spark.hadoop.fs.s3n.awsSecretAccessKey <your_secret_key>
    • S3A
      spark.hadoop.fs.s3a.awsAccessKeyId <your_access_key>
      spark.hadoop.fs.s3a.access.key <your_access_key>
      spark.hadoop.fs.s3a.awsSecretAccessKey <your_secret_key> 
  4. If you need to run Spark Streaming Jobs with Databricks, in the same Spark tab, add the following property to define a default Spark serializer. If you do not plan to run Spark Streaming Jobs, you can ignore this step.
    spark.serializer org.apache.spark.serializer.KryoSerializer
  5. Restart your Spark cluster.
  6. In the Spark UI tab of your Databricks cluster page, click Environment to display the list of properties and verify that each of the properties you added in the previous steps is present on that list.