Exporting a Kerberos-secured Hive dataset to HDFS

Exporting a Kerberos-secured Hive dataset to HDFS - 8.0

Talend Data Preparation User Guide

Version

8.0

Language

English

Product

Talend Big Data

Talend Big Data Platform

Talend Data Fabric

Talend Data Integration

Talend Data Management Platform

Talend Data Services Platform

Talend ESB

Talend MDM Platform

Talend Real-Time Big Data Platform

Module

Talend Data Preparation

Content

Data Quality and Preparation > Cleansing data

Last publication date

2024-03-26

To enable exports from to a Kerberos Cloudera environment for Hive datasets, you must edit the Spark Job Server configuration files.

Important: Make sure that your keytab file used to authenticate to HDFS is accessible to all the workers on the cluster.

Procedure

Create a <sjs_path>/jobserver_gss.conf file, and add the following configuration parameters:

com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=false
doNotPrompt=true
useKeyTab=true
keyTab="/path/to/the/keytab/keytab_file.keytab"
principal="your@principalHere"
debug=true;
};

In the <sjs_path>/manager_start.sh file, set these parameters with the following values to reference the previously created <sjs_path>/jobserver_gss.conf file:

KRB5_OPTS="-Djava.security.auth.login.config=jobserver_gss.conf
 -Djava.security.krb5.debug=true
 -Djava.security.krb5.conf=/path/to/krb5.conf
 -Djavax.security.auth.useSubjectCredsOnly=false"

 --conf "spark.executor.extraJavaOptions=$LOGGING_OPTS $KRB5_OPTS"
 --conf "spark.yarn.dist.files=/path/to/jobserver_gss.conf"
 --proxy-user $4
 --driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES $JDBC_PROPERTIES $KRB5_OPTS"

When importing your dataset in Talend Data Preparation, the JDBC URL used to connect to Hive must follow this model:
jdbc:hive2://host:10000/default;principal=<your_principal>
Copy the <components_catalog_path>/config/jdbc_config.json file that contains the Hive driver to the Spark Job Server installation folder.
Copy the .jar files from the <components_catalog_path>/.m2 folder to the <sjs_path>/datastreams-deps folder.

Results

You can now export your Hive datasets to HDFS.