To enable exports from to a Kerberos Cloudera environment for Hive datasets, you
will have to edit the Spark Job Server configuration files.
Important: Make sure that your keytab file used to authenticate to HDFS is accessible
to all the workers on the cluster.
Procedure
-
Create a <sjs_path>/jobserver_gss.conf file, and add
the following configuration parameters:
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=false
doNotPrompt=true
useKeyTab=true
keyTab="/path/to/the/keytab/keytab_file.keytab"
principal="your@principalHere"
debug=true;
};
-
In the <sjs_path>/manager_start.sh file, set these
parameters with the following values to reference the previously created
<sjs_path>/jobserver_gss.conf file:
KRB5_OPTS="-Djava.security.auth.login.config=jobserver_gss.conf
-Djava.security.krb5.debug=true
-Djava.security.krb5.conf=/path/to/krb5.conf
-Djavax.security.auth.useSubjectCredsOnly=false"
--conf "spark.executor.extraJavaOptions=$LOGGING_OPTS $KRB5_OPTS"
--conf "spark.yarn.dist.files=/path/to/jobserver_gss.conf"
--proxy-user $4
--driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES $JDBC_PROPERTIES $KRB5_OPTS"
-
When importing your dataset in Talend Data Preparation, the JDBC URL used to
connect to Hive must follow this model:
jdbc:hive2://host:10000/default;principal=<your_principal>
-
Copy the
<components_catalog_path>/config/jdbc_config.json
file that contains the Hive driver to the Spark Job Server installation
folder.
-
Copy the .jar files from the
<components_catalog_path>/.m2 folder to the
<sjs_path>/datastreams-deps folder.
Results
You can now export your Hive datasets to HDFS.