Exporting a Kerberos-secured Hive dataset to HDFS

To enable exports from to a Kerberos Cloudera environment for Hive datasets, you must edit the Spark Job Server configuration files.

Important: Make sure that your keytab file used to authenticate to HDFS is accessible to all the workers on the cluster.

Procedure

Create a <sjs_path>/jobserver_gss.conf file, and add the following configuration parameters:

com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=false
doNotPrompt=true
useKeyTab=true
keyTab="/path/to/the/keytab/keytab_file.keytab"
principal="your@principalHere"
debug=true;
};

In the <sjs_path>/manager_start.sh file, set these parameters with the following values to reference the previously created <sjs_path>/jobserver_gss.conf file:

KRB5_OPTS="-Djava.security.auth.login.config=jobserver_gss.conf
 -Djava.security.krb5.debug=true
 -Djava.security.krb5.conf=/path/to/krb5.conf
 -Djavax.security.auth.useSubjectCredsOnly=false"

 --conf "spark.executor.extraJavaOptions=$LOGGING_OPTS $KRB5_OPTS"
 --conf "spark.yarn.dist.files=/path/to/jobserver_gss.conf"
 --proxy-user $4
 --driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES $JDBC_PROPERTIES $KRB5_OPTS"

When importing your dataset in Talend Data Preparation, the JDBC URL used to connect to Hive must follow this model:
jdbc:hive2://host:10000/default;principal=<your_principal>
Copy the <components_catalog_path>/config/jdbc_config.json file that contains the Hive driver to the Spark Job Server installation folder.
Copy the .jar files from the <components_catalog_path>/.m2 folder to the <sjs_path>/datastreams-deps folder.

Results

You can now export your Hive datasets to HDFS.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here