Available in...Big Data Platform
Cloud API Services Platform
Cloud Big Data Platform
Cloud Data Fabric
Cloud Data Management Platform
Data Fabric
Data Management Platform
Data Services Platform
MDM Platform
Real-Time Big Data Platform
About this task
To connect to a Databricks cluster on Amazon S3, follow this
procedure
Adding S3 specific properties to access the S3 system from Databricks.
Procedure
-
In the DQ
Repository tree view, expand
Metadata and right-click DB
Connections.
-
Click Create DB connection.
The Database Connection wizard is
displayed.
-
Enter a name and click Next. The other fields are
optional.
-
Select JDBC as the DB Type.
-
In the JDBC URL field,
enter the URL of your ADLS Databricks cluster. To get the URL:
-
Go to Azure Databricks.
-
In the clusters list, click the cluster you want to
connect to.
-
Expand the Advanced
Options section and select the JDBC/ODBC tab.
-
Copy the content of the JDBC
URL field. The URL format is
jdbc:spark://<server-hostname>:<port>/default;transportMode=http;ssl=1;httpPath=<http-path>;AuthMech=3
.
Note: To encrypt the token in a safer way, it is
recommended to enter the UID
and
PWD
parameters in the
Database Connection
wizard of Talend Studio.
-
Go back to the Database Connection wizard.
-
Paste the JDBC URL.
-
Add the JDBC driver to the Drivers list:
-
Click the [+]
button. A new line is added to the list.
-
Click the […]
button next to the new line. The Module dialog box is displayed.
-
In the Platform
list, select the JDBC driver and click OK. You are back to the Database Connection wizard.
-
Click Select class name next to the
Driver Class field and select com.simba.spark.jdbc4.Driver.
-
Enter the User Id and Password.
-
In Mapping file, select Mapping
Hive.
You have the following configuration:
-
Click Test Connection.
- If the test is successful, click Finish to close
the wizard.
- If the test fails, verify the configuration.