Talend Studio allows you to authenticate to your Spark Streaming and Spark Batch Jobs using Knox with a CDP Public Cloud Data Hub instance in YARN cluster mode. You can complete the Knox connection parameters either in the Spark configuration tab of the Run view of your Job or in the Hadoop Cluster Connection metadata wizard. This configuration is effective on a per-Job basis.
In this scenario, the configuration via the Hadoop Cluster Connection metadata wizard is used. Setting up the connection to Knox in the Repository allows you to avoid configuring that connection each time you need it in the Spark Configuration view of your Spark Jobs.
For more information about the configuration via the Spark configuration tab of the Run view of your Job, see Defining the Cloudera connection parameters.
The information in this section is only for users who have subscribed to Talend Data Fabric or to any Talend product with Big Data but it is not applicable to Talend Open Studio for Big Data users.
- In the Repository tree view of your Studio, expand Metadata and then right-click Hadoop cluster.
- Select Create Hadoop cluster from the contextual menu to open the Hadoop Cluster Connection wizard.
Fill in generic information about this connection, such as
Name and Description and click
Next to open the Hadoop Configuration
Import Wizard window that allows you to select the distribution
to be used and the manual or the automatic mode to configure the connection.
Important: Knox is only supported with CDP 7.1 and onwards.
- Select Cloudera from the Distribution drop-down list and Cloudera CDP 7.1 from the Version drop-down list.
- Select Enter manually Hadoop services and click Finish.
Select the Use Knox check box and enter the Knox related
Knox URL: enter the Knox URL respecting the
https://<host>/<datahub>/cdp-proxy-api. You can find the Knox URL on the Cloudera Management Console in the Endpoints section of your Data Hub under Livy Server.Important: If you have the R2021-07 or a previous patch installed, the URL should not include
/livyor any other suffix after
cdp-proxy-apiat the end. If you have the R2021-08 or a later patch installed, the URL work with or without
/livyat the end.
Knox user: enter your Workload User Name from
Cloudera Management Console.
- Knox password: enter your Workload Password from Cloudera Management Console.
- Knox directory: type in the location storing the loaded file in HDFS.
- Knox session timeout: specify the amount of time to wait for the Job to reconnect to the cluster via Knox.
- Knox URL: enter the Knox URL respecting the following format
- Optional: Click Check services to verify that Talend Studio can connect to the services you have specified in this wizard.
- Optional: Click Export as context to create a new context with these data and save it in the repository.
Click Finish to validate your changes and close the
The newly set-up Hadoop connection displays under the Hadoop cluster folder in the Repository tree view.