Setting up the Knox parameters with CDP Public Cloud Data Hub - 7.3

Cloudera

Version
7.3
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs > Hadoop distributions > Cloudera

Talend Studio allows you to authenticate to your Spark Streaming and Spark Batch Jobs using Knox with a CDP Public Cloud Data Hub instance in YARN cluster mode. You can complete the Knox connection parameters either in the Spark configuration tab of the Run view of your Job or in the Hadoop Cluster Connection metadata wizard. This configuration is effective on a per-Job basis.

In this scenario, the configuration via the Hadoop Cluster Connection metadata wizard is used. Setting up the connection to Knox in the Repository allows you to avoid configuring that connection each time you need it in the Spark Configuration view of your Spark Jobs.

For more information about the configuration via the Spark configuration tab of the Run view of your Job, see Defining the Cloudera connection parameters.

The information in this section is only for users who have subscribed to Talend Data Fabric or to any Talend product with Big Data but it is not applicable to Talend Open Studio for Big Data users.

Procedure

  1. In the Repository tree view of your Studio, expand Metadata and then right-click Hadoop cluster.
  2. Select Create Hadoop cluster from the contextual menu to open the Hadoop Cluster Connection wizard.
  3. Fill in generic information about this connection, such as Name and Description and click Next to open the Hadoop Configuration Import Wizard window that allows you to select the distribution to be used and the manual or the automatic mode to configure the connection.
    Important: Knox is only supported with CDP 7.1 and onwards.
  4. Select Cloudera from the Distribution drop-down list and Cloudera CDP 7.1 from the Version drop-down list.
  5. Select Enter manually Hadoop services and click Finish.
  6. Select the Use Knox check box and enter the Knox related connection parameters:
    • Knox URL: enter the Knox URL respecting the following format https://<host>/<datahub>/cdp-proxy-api. You can find the Knox URL on the Cloudera Management Console in the Endpoints section of your Data Hub under Livy Server.
      Important: If you have the R2021-07 or a previous patch installed, the URL should not include /livy or any other suffix after cdp-proxy-api at the end. If you have the R2021-08 or a later patch installed, the URL work with or without /livy at the end.
    • Knox user: enter your Workload User Name from Cloudera Management Console.
    • Knox password: enter your Workload Password from Cloudera Management Console.
    • Knox directory: type in the location storing the loaded file in HDFS.
  7. Optional: Click Check services to verify that Talend Studio can connect to the services you have specified in this wizard.
  8. Optional: Click Export as context to create a new context with these data and save it in the repository.
  9. Click Finish to validate your changes and close the wizard.
    The newly set-up Hadoop connection displays under the Hadoop cluster folder in the Repository tree view.