Skip to main content Skip to complementary content

Importing Hive or HDFS datasets on a multi-node cluster

To enable import for Hive or HDFS datasets stored on a multi-node cluster, you will have to edit the Components Catalog configuration files.

Information noteImportant: Make sure that your keytab file used to authenticate to HDFS is accessible to all the workers on the cluster.

Procedure

  1. Create a <components_catalog>/tcomp_gss.conf file, and add the following configuration parameters:
    com.sun.security.jgss.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=false
    doNotPrompt=true
    useKeyTab=true
    keyTab="/path/to/the/keytab/keytab_file.keytab"
    principal="your@principalHere"
    debug=true;
    };
  2. In the <components_catalog>/start.sh set these parameters with the following values to reference the previously created <components_catalog>/tcomp_gss.conf file:
    THE_CMD="$JAVA_BIN $SCRIPT_JAVA_OPTS -Djava.security.auth.login.config=/path/to/gss.conf -Djava.security.krb5.debug=true
    -Djava.security.krb5.conf="/etc/krb5.conf" -Djavax.security.auth.useSubjectCredsOnly=false -cp
    \"$APP_CLASSPATH\" $APP_CLASS $*"
  3. When importing your dataset in Talend Data Preparation, the JDBC URL used to connect to Hive must follow this model:
    jdbc:hive2://host:10000/default;principal=<your_principal>

Results

You can now import Hive or HDFS datasets stored on a multi-node cluster.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!