To enable import for Hive or HDFS datasets stored on a multi-node
cluster, you will have to edit the Components Catalog configuration files.
Important: Make sure that your keytab file used to authenticate to HDFS is accessible
to all the workers on the cluster.
Procedure
-
Create a <components_catalog>/tcomp_gss.conf file, and
add the following configuration parameters:
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=false
doNotPrompt=true
useKeyTab=true
keyTab="/path/to/the/keytab/keytab_file.keytab"
principal="your@principalHere"
debug=true;
};
-
In the <components_catalog>/start.sh set these
parameters with the following values to reference the previously
created <components_catalog>/tcomp_gss.conf
file:
THE_CMD="$JAVA_BIN $SCRIPT_JAVA_OPTS -Djava.security.auth.login.config=/path/to/gss.conf -Djava.security.krb5.debug=true
-Djava.security.krb5.conf="/etc/krb5.conf" -Djavax.security.auth.useSubjectCredsOnly=false -cp
\"$APP_CLASSPATH\" $APP_CLASS $*"
-
When importing your dataset in Talend Data Preparation, the JDBC URL used to
connect to Hive must follow this model:
jdbc:hive2://host:10000/default;principal=<your_principal>
Results
You can now import Hive or HDFS datasets stored on a multi-node cluster.