Exporting a Kerberos-secured Hive dataset to HDFS - 6.5

Talend Data Preparation User Guide

EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Data Preparation
task
Data Quality and Preparation > Cleansing data

To enable exports from to a Kerberos Cloudera environment for Hive datasets, you will have to edit the Spark Job Server configuration files.

Important: Make sure that your keytab file used to authenticate to HDFS is accessible to all the workers on the cluster.

Procedure

  1. Create a <sjs_path>/jobserver_gss.conf file, and add the following configuration parameters:
    com.sun.security.jgss.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=false
    doNotPrompt=true
    useKeyTab=true
    keyTab="/path/to/the/keytab/keytab_file.keytab"
    principal="your@principalHere"
    debug=true;
    };
  2. In the <sjs_path>/manager_start.sh file, set these parameters with the following values to reference the previously created <sjs_path>/jobserver_gss.conf file:
    KRB5_OPTS="-Djava.security.auth.login.config=jobserver_gss.conf
     -Djava.security.krb5.debug=true
     -Djava.security.krb5.conf=/path/to/krb5.conf
     -Djavax.security.auth.useSubjectCredsOnly=false"
     --conf "spark.executor.extraJavaOptions=$LOGGING_OPTS $KRB5_OPTS"
     --conf "spark.yarn.dist.files=/path/to/jobserver_gss.conf"
     --proxy-user $4
     --driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES $JDBC_PROPERTIES $KRB5_OPTS"
  3. When importing your dataset in Talend Data Preparation, the JDBC URL used to connect to Hive must follow this model: jdbc:hive2://host:10000/default;principal=<your_principal>
  4. Copy the <components_catalog_path>/config/jdbc_config.json file that contains the Hive driver to the Spark Job Server installation folder.
  5. Copy the .jar files from the <components_catalog_path>/.m2 folder to the <sjs_path>/datastreams-deps folder.

Results

You can now export your Hive datasets to HDFS.