Configuring a Kerberos-secured connection to Hive - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-26

Hive is one of the many databases that can be added to the list of data sources available for Talend Data Preparation.

The section Adding a new database type explains how to add new JDBC drivers to enrich the list of databases available from Talend Data Preparation. However, this specific example focuses on how to configure a direct connection from your Hive database to Talend Data Preparation. An additional configuration step allows you to secure this connection with Kerberos.

Before you begin

You have downloaded and added the Hive driver to your <components_catalog_path>.m2/jdbc-drivers/<database_name>/<jdbc_version> folder, as described in the page.

Procedure

  1. In your Components Catalog installation folder, open the file config/settings.xml.
    By default, the Components Catalog installation folder is located at <TDP_installation_folder>/services/.
  2. Add the Cloudera repository to settings.xml.
    <settings>
        <profile>
            <id>cloudera</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <repositories>
                <repository>
                    <id>cloudera</id>
                    <name>Cloudera repository</name>
                    <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
                    <layout>default</layout>
                </repository>
            </repositories>
        </profile>
    </settings>
  3. Open <components_catalog_path>/config/jdbc_config.json and add the Hive driver.
    Compatible versions mentioned may be subject to change, use the following code as an example only. For more information, see Cloudera documentation.
    {
        "id": "Hive",
        "class": "org.apache.hive.jdbc.HiveDriver",
        "url": "jdbc:hive2://host:10000/default;principal=<your
        principal>",
        "paths": [
            {"path": "mvn:org.apache.hive/hive-jdbc/3.1.2"},
                            {"path": "mvn:commons-el/commons-el/1.0"},
                            {"path": "mvn:org.datanucleus/datanucleus-core/4.1.6"},
                            {"path": "mvn:asm/asm-commons/5.0.1"},
                            {"path": "mvn:tomcat/jasper-compiler/5.5.23"},
                            {"path": "mvn:org.apache.derby/derby/10.14.1.0"},
                            {"path": "mvn:jline/jline/2.12"},
                            {"path": "mvn:org.apache.commons/commons-compress/1.11"},
                            {"path": "mvn:com.fasterxml.jackson.core/jackson-annotations/2.9.5"},
                            {"path": "mvn:org.apache.hive/hive-metastore/2.1.1-cdh6.1.1"},
                            {"path": "mvn:org.apache.hive/hive-shims/0.23-2.1.1-cdh6.1.1"},
                            {"path": "mvn:org.apache.hive/hive-shims/2.1.1-cdh6.1.1"},
                            {"path": "mvn:joda-time/joda-time/2.9.9"},
                            {"path": "mvn:org.codehaus.jackson/jackson-mapper-asl/1.9.13-cloudera.1"},
                            {"path": "mvn:com.google.code.findbugs/jsr305/3.0.0"},
                            {"path": "mvn:org.apache.zookeeper/zookeeper/3.4.6"},
                            {"path": "mvn:commons-pool/commons-pool/1.6"},
                            {"path": "mvn:org.apache.avro/avro/1.8.2-cdh6.1.1"},
                            {"path": "mvn:com.twitter/parquet-hadoop-bundle/1.9.0-cdh6.1.1"},
                            {"path": "mvn:com.sun.jersey/jersey-servlet/1.19"},
                            {"path": "mvn:commons-dbcp/commons-dbcp/1.4"},
                            {"path": "mvn:org.slf4j/slf4j-api/1.7.25"},
                            {"path": "mvn:javax.servlet.jsp/jsp-api/2.1"},
                            {"path": "mvn:com.codahale.metrics/metrics-jvm/3.1.5"},
                            {"path": "mvn:com.thoughtworks.paranamer/paranamer/2.8"},
                            {"path": "mvn:tomcat/jasper-runtime/5.5.23"},
                            {"path": "mvn:com.fasterxml.jackson.core/jackson-databind/2.9.5"},
                            {"path": "mvn:asm/asm-tree/5.0.4"},
                            {"path": "mvn:com.codahale.metrics/metrics-core/3.2.1"},
                            {"path": "mvn:com.sun.jersey/jersey-core/1.19"},
                            {"path": "mvn:org.apache.hive/hive-service/2.1.1-cdh6.1.1"},
                            {"path": "mvn:org.jamon/jamon-runtime/2.4.1"},
                            {"path": "mvn:com.sun.jersey/jersey-server/1.19"},
                            {"path": "mvn:org.apache.commons/commons-lang3/3.8.1"},
                            {"path": "mvn:com.codahale.metrics/metrics-json/3.2.1"},
                            {"path": "mvn:org.apache.commons/commons-configuration2/2.1.1"},
                            {"path": "mvn:org.apache.hive/hive-common/2.1.1-cdh6.1.1"},
                            {"path": "mvn:org.apache.curator/curator-client/4.0.0"},
                            {"path": "mvn:org.apache.thrift/libfb303/0.9.3"},
                            {"path": "mvn:org.apache.thrift/libthrift/0.9.3"},
                            {"path": "mvn:net.sf.opencsv/opencsv/2.3"},
                            {"path": "mvn:commons-lang/commons-lang/2.6"},
                            {"path": "mvn:com.fasterxml.jackson.core/jackson-core/2.9.5"},
                            {"path": "mvn:org.tukaani/xz/1.6"},
                            {"path": "mvn:com.jolbox/bonecp/0.8.0.RELEASE"},
                            {"path": "mvn:org.apache.httpcomponents/httpcore/4.4.10"},
                            {"path": "mvn:org.apache.hive/hive-serde/2.1.1-cdh6.1.1"},
                            {"path": "mvn:commons-cli/commons-cli/1.4"},
                            {"path": "mvn:com.google.guava/guava/14.0.1"},
                            {"path": "mvn:org.apache.httpcomponents/httpclient/4.5.6"},
                            {"path": "mvn:commons-codec/commons-codec/1.11"},
                            {"path": "mvn:log4j/log4j/1.2-api-2.17.1"},
                            {"path": "mvn:org.apache.ant/ant/1.9.1"},
                            {"path": "mvn:org.datanucleus/datanucleus-rdbms/4.1.7"},
                            {"path": "mvn:javax.transaction/jta/1.1"},
                            {"path": "mvn:commons-logging/commons-logging/1.2"},
                            {"path": "mvn:javax.servlet/servlet-api/2.5"},
                            {"path": "mvn:org.apache.ant/ant-launcher/1.9.1"},
                            {"path": "mvn:net.sf.jpam/jpam/1.1"},
                            {"path": "mvn:org.codehaus.jackson/jackson-core-asl/1.9.13"},
                            {"path": "mvn:org.datanucleus/datanucleus-api-jdo/4.2.1"},
                            {"path": "mvn:org.apache.hive.shims/hive-shims-common/2.1.1-cdh6.1.1"},
                            {"path": "mvn:javax.jdo/jdo-api/3.0.1"},
                            {"path": "mvn:org.xerial.snappy/snappy-java/1.1.7.1"},
                            {"path": "mvn:org.apache.curator/curator-framework/4.0.0"},
                            {"path": "mvn:asm/asm/5.0.4"},
                            {"path": "mvn:org.apache.hadoop/hadoop-yarn-server-resourcemanager:3.0.0-cdh6.1.1"},
                            {"path": "mvn:org.apache.hadoop/hadoop-yarn-server-applicationhistoryservice:3.0.0-cdh6.1.1"},
                            {"path": "mvn:org.apache.hadoop/hadoop-annotations/3.0.0-cdh6.1.1"},
                            {"path": "mvn:org.apache.hadoop/hadoop-yarn-common/3.0.0-cdh6.1.1"},
                            {"path": "mvn:org.apache.hadoop/hadoop-yarn-api/3.0.0-cdh6.1.1"},
                            {"path": "mvn:org.apache.hadoop/hadoop-yarn-server-common/3.0.0-cdh6.1.1"},
                            {"path": "mvn:org.apache.hadoop/hadoop-yarn-server-web-proxy/3.0.0-cdh6.1.1"},
                            {"path": "mvn:org.apache.hadoop/hadoop-common/3.0.0-cdh6.1.1"},
                            {"path": "mvn:org.apache.hadoop/hadoop-auth/3.0.0-cdh6.1.1"},
                            {"path": "mvn:org.apache.hive/hive-service-rpc/2.1.1-cdh6.1.1"},
                            {"path": "mvn:com.fasterxml.woodstox/woodstox-core/5.1.0"},
                            {"path": "mvn:org.codehaus.woodstox/stax2-api/3.1.4"}
        ]
    }
  4. Open <components_catalog_path>/start.sh and add the following to the system properties: javax.security.auth.useSubjectCredsOnly=false.
    THE_CMD="$JAVA_BIN $JAVA_OPTS -
    Djavax.security.auth.useSubjectCredsOnly=false -cp
    \"$APP_CLASSPATH\" $APP_CLASS $*"
  5. Open <components_catalog_path>/config/application.properties and configure the krb5.config to point to the location where the Components Catalog server is installed:

    Example

    krb5.config=/etc/krb5.conf
  6. Create a file named sun.conf in /config/org/talend/daikon/sandbox/properties/.
    This file is needed to allow the Hive component to access specific system properties.
    Warning: If the directories in the path org/talend/daikon/sandbox/properties/ do not exist in <components_catalog_path>/config, create them.
  7. Add the following content in sun.conf.
    #
    # This file contains all Sun/Oracle specific system properties
    #
    java.runtime.name
    sun.boot.library.path
    java.vm.version
    java.vm.vendor
    java.vendor.url
    path.separator
    java.vm.name
    file.encoding.pkg
    sun.java.launcher
    user.country
    sun.os.patch.level
    java.vm.specification.name
    user.dir
    java.runtime.version
    java.awt.graphicsenv
    java.endorsed.dirs
    os.arch
    java.io.tmpdir
    line.separator
    java.vm.specification.vendor
    os.name
    sun.jnu.encoding
    java.library.path
    java.specification.name
    java.class.version
    sun.management.compiler
    os.version
    user.home
    user.timezone
    java.awt.printerjob
    idea.launcher.bin.path
    file.encoding
    java.specification.version
    java.class.path
    user.name
    java.vm.specification.version
    sun.java.command
    java.home
    sun.arch.data.model
    user.language
    java.specification.vendor
    java.vm.info
    java.version
    java.ext.dirs
    sun.boot.class.path
    sun.java.command
    java.home
    sun.arch.data.model
    user.language
    java.specification.vendor
    java.vm.info
    java.version
    java.ext.dirs
    sun.boot.class.path
    java.vendor
    file.separator
    java.vendor.url.bug
    sun.io.unicode.encoding
    sun.cpu.endian
    sun.desktop
    sun.cpu.isalist
    java.security.krb5.conf
    sun.security.krb5.debug
    java.security.krb5.kdc
    java.security.krb5.realm
    java.security.auth.login.config
    javax.security.auth.useSubjectCredsOnly
  8. Restart the Components Catalog service.

Results

In Talend Data Preparation, the Hive database is now available in the database dataset import form, in the Database type drop-down list.

Important: The Username and Password fields are mandatory but because the authentication is performed using Kerberos in this case, they can be filled with placeholder values.

When exporting a preparation made on data stored on your Hive database, you can choose to process the data on the Talend Data Preparation server.

For more information on how to import data from a database, see Adding a dataset from a database.