Talend Studio
allows you to authenticate to your Spark Streaming and Spark Batch Jobs using Knox with
a CDP Public Cloud Data Hub instance in YARN cluster mode. You can complete the Knox
connection parameters either in the Spark configuration tab of
the Run view of your Job or in the Hadoop Cluster
Connection metadata wizard. This configuration is effective on a per-Job
basis.
In this scenario, the configuration via the Hadoop Cluster
Connection metadata wizard is used. Setting up the connection to Knox in
the Repository allows you to avoid configuring that connection
each time you need it in the Spark Configuration view of your
Spark Jobs.
For more information about the configuration via the Spark
configuration tab of the Run view of your Job,
see Defining the Cloudera connection parameters.
The information in this section is only for users who have subscribed to
Talend Data Fabric or to any Talend product with Big Data but it is not
applicable to Talend Open Studio for Big Data users.
Procedure
-
In the Repository tree view of your Studio, expand
Metadata and then right-click Hadoop
cluster.
-
Select Create Hadoop cluster from the contextual menu to
open the Hadoop Cluster Connection wizard.
-
Fill in generic information about this connection, such as
Name and Description and click
Next to open the Hadoop Configuration
Import Wizard window that allows you to select the distribution
to be used and the manual or the automatic mode to configure the connection.
Important: Knox is only supported with CDP 7.1 and onwards.
-
Select Cloudera from the
Distribution drop-down list and Cloudera
CDP 7.1 from the Version drop-down
list.
-
Select Enter manually Hadoop services and click
Finish.
-
Select the Use Knox check box and enter the Knox related
connection parameters:
-
Knox URL: enter the Knox URL respecting the
following format
https://<host>/<datahub>/cdp-proxy-api
. You can
find the Knox URL on the Cloudera Management Console in the
Endpoints section of your Data Hub under
Livy Server.Important: If you have the R2021-07 or a previous patch
installed, the URL should not include /livy
or any
other suffix after cdp-proxy-api
at the end. If you
have the R2021-08 or a later patch installed, the URL work with or
without /livy
at the end.
-
Knox user: enter your Workload User Name from
Cloudera Management Console.
-
Knox password: enter your Workload Password from
Cloudera Management Console.
-
Knox directory: type in the location storing the
loaded file in HDFS.
-
Knox session timeout: specify the amount of time
to wait for the Job to reconnect to the cluster via Knox.
- Optional:
Click Check services to verify that Talend Studio can connect to the services you have specified in this wizard.
- Optional:
Click Export as context to create a new context with
these data and save it in the repository.
-
Click Finish to validate your changes and close the
wizard.
The newly set-up Hadoop connection displays under the Hadoop
cluster folder in the Repository tree
view.