Exporting a Kerberos-secured Hive dataset to HDFS - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-26

To enable exports from to a Kerberos Cloudera environment for Hive datasets, you must edit the Spark Job Server configuration files.

Important: Make sure that your keytab file used to authenticate to HDFS is accessible to all the workers on the cluster.

Procedure

  1. Create a <sjs_path>/jobserver_gss.conf file, and add the following configuration parameters:
    com.sun.security.jgss.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=false
    doNotPrompt=true
    useKeyTab=true
    keyTab="/path/to/the/keytab/keytab_file.keytab"
    principal="your@principalHere"
    debug=true;
    };
  2. In the <sjs_path>/manager_start.sh file, set these parameters with the following values to reference the previously created <sjs_path>/jobserver_gss.conf file:
    KRB5_OPTS="-Djava.security.auth.login.config=jobserver_gss.conf
     -Djava.security.krb5.debug=true
     -Djava.security.krb5.conf=/path/to/krb5.conf
     -Djavax.security.auth.useSubjectCredsOnly=false"
     --conf "spark.executor.extraJavaOptions=$LOGGING_OPTS $KRB5_OPTS"
     --conf "spark.yarn.dist.files=/path/to/jobserver_gss.conf"
     --proxy-user $4
     --driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES $JDBC_PROPERTIES $KRB5_OPTS"
  3. When importing your dataset in Talend Data Preparation, the JDBC URL used to connect to Hive must follow this model:
    jdbc:hive2://host:10000/default;principal=<your_principal>
  4. Copy the <components_catalog_path>/config/jdbc_config.json file that contains the Hive driver to the Spark Job Server installation folder.
  5. Copy the .jar files from the <components_catalog_path>/.m2 folder to the <sjs_path>/datastreams-deps folder.

Results

You can now export your Hive datasets to HDFS.