Before you begin
- You have selected the Profiling perspective of Talend Studio.
- You have added the JDBC driver to the Studio.
To connect to a Databricks cluster on Amazon S3, follow this procedure Adding S3 specific properties to access the S3 system from Databricks.
About this task
- In the DQ Repository tree view, expand Metadata and right-click DB Connections.
Click Create DB connection.
The Database Connection wizard is displayed.
- Enter a name and click Next. The other fields are optional.
- Select JDBC as the DB Type.
In the JDBC URL field,
enter the URL of your ADLS Databricks cluster. To get the URL:
- Go to Azure Databricks.
- In the clusters list, click the cluster you want to connect to.
- Expand the Advanced Options section and select the JDBC/ODBC tab.
Copy the content of the JDBC
URL field. The URL format is
jdbc:spark://<server-hostname>:<port>/default;transportMode=http;ssl=1;httpPath=<http-path>;AuthMech=3.Note: To encrypt the token in a safer way, it is recommended to enter the
PWDparameters in the Database Connection wizard of Talend Studio.
- Go back to the Database Connection wizard.
- Paste the JDBC URL.
Add the JDBC driver to the Drivers list:
- Click the [+] button. A new line is added to the list.
- Click the […] button next to the new line. The Module dialog box is displayed.
- In the Platform list, select the JDBC driver and click OK. You are back to the Database Connection wizard.
- Click Select class name next to the Driver Class field and select com.simba.spark.jdbc4.Driver.
- Enter the User Id and Password.
In Mapping file, select Mapping
You have the following configuration:
Click Test Connection.
- If the test is successful, click Finish to close the wizard.
- If the test fails, verify the configuration.