Talend Studio
allows you to authenticate to your Spark Streaming and Spark Batch Jobs using Knox with
a CDP Public Cloud Data Hub instance in YARN cluster mode. You can complete the Knox
connection parameters either in the Spark configuration tab of
the Run view of your Job or in the Hadoop Cluster
Connection metadata wizard. This configuration is effective on a per-Job
basis.
In this scenario, the configuration via the Hadoop Cluster
Connection metadata wizard is used. Setting up the connection to Knox in
the Repository allows you to avoid configuring that connection
each time you need it in the Spark Configuration view of your
Spark Jobs.
For more information about the configuration via the Spark
configuration tab of the Run view of your Job,
see Defining the Cloudera connection parameters.
The information in this section is only for users who have subscribed to
Talend Data Fabric
or to any Talend product
with Big Data.
Procedure
-
In the Repository tree view of Talend Studio,
expand Metadata and then right-click Hadoop
cluster.
-
Select Create Hadoop cluster from the contextual menu to
open the Hadoop Cluster Connection wizard.
-
Fill in generic information about this connection, such as
Name and Description and click
Next to open the Hadoop Configuration
Import Wizard window that allows you to select the distribution
to be used and the manual or the automatic mode to configure the connection.
Important: Knox is only supported with CDP 7.1 and onwards.
-
Select Cloudera from the
Distribution drop-down list and Cloudera
CDP 7.1 from the Version drop-down
list.
-
Select Enter manually Hadoop services and click
Finish.
-
Select the Use Knox check box and enter the Knox related
connection parameters:
- Optional:
Click Check services to verify that Talend Studio can connect to the services you have specified in this wizard.
- Optional:
Click Export as context to create a new context with
these data and save it in the repository.
-
Click Finish to validate your changes and close the
wizard.
The newly set-up Hadoop connection displays under the Hadoop
cluster folder in the Repository tree
view.