Configuring a Kerberos-secured connection to Hive - 2.3

Talend Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
6.5
2.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

Hive is one of the many databases that can be added to the list of data sources available for Talend Data Preparation.

The section Adding a new database type explains how to add new JDBC drivers to enrich the list of databases available from Talend Data Preparation. However, this specific example focuses on how to configure a direct connection from your Hive database to Talend Data Preparation. An additional configuration step allows you to secure this connection with Kerberos.

Procedure

  1. In your Components Catalog installation folder, open the file config/settings.xml.
    By default, the Components Catalog installation folder is located at <TDP_installation_folder>/services/.
  2. Add the Cloudera repository to settings.xml:
    <settings>
        <profile>
            <id>cloudera</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <repositories>
                <repository>
                    <id>cloudera</id>
                    <name>Cloudera repository</name>
                    <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
                    <layout>default</layout>
                </repository>
            </repositories>
        </profile>
    </settings>
  3. Open <components_catalog_path>/config/jdbc_config.json and add the Hive driver:
    {
        "id": "Hive",
        "class": "org.apache.hive.jdbc.HiveDriver",
        "url": "jdbc:hive2://host:10000/default;principal=<your
        principal>",
        "paths": [
            {
                "path": "mvn:commons-el/commons-el/1.0"
            },
            {
                "path": "mvn:org.datanucleus/datanucleus-core/3.2.10"
            },
            {
                "path": "mvn:asm/asm-commons/3.1"
            },
            {
                "path": "mvn:tomcat/jasper-compiler/5.5.23"
            },
            {
                "path": "mvn:org.apache.derby/derby/10.11.1.1"
            },
            {
                "path": "mvn:jline/jline/2.12"
            },
            {
                "path": "mvn:org.apache.commons/commons-compress/1.4.1"
            },
            {
                "path": "mvn:com.fasterxml.jackson.core/jackson-annotations/2.2.2"
            },
            {
                "path": "mvn:org.apache.hive/hive-metastore/1.1.0-cdh5.13.1"
            },
            {
                "path": "mvn:org.apache.hive/hive-shims/1.1.0-cdh5.13.1"
            },
            {
                "path": "mvn:org.apache.hive/hive-shims/1.1.0-cdh5.13.1"
            },
            {
                "path": "mvn:joda-time/joda-time/1.6"
            },
            {
                "path": "mvn:org.codehaus.jackson/jackson-mapper-asl/1.9.2"
            },
            {
                "path": "mvn:com.google.code.findbugs/jsr305/3.0.0"
            },
            {
                "path": "mvn:org.apache.zookeeper/zookeeper/3.4.5-cdh5.13.1"
            },
            {
                "path": "mvn:antlr/antlr/2.7.7"
            },
            {
                "path": "mvn:commons-pool/commons-pool/1.5.4"
            },
            {
                "path": "mvn:org.apache.avro/avro/1.7.6-cdh5.13.1"
            },
            {
                "path": "mvn:org.antlr/stringtemplate/3.2.1"
            },
            {
                "path": "mvn:org.slf4j/slf4j-log4j12/1.7.5"
            },
            {
                "path": "mvn:org.eclipse.jetty.aggregate/jetty-all/7.6.0.v20120127"
            },
            {
                "path": "mvn:com.twitter/parquet-hadoop-bundle/1.5.0-cdh5.13.1"
            },
            {
                "path": "mvn:com.sun.jersey/jersey-servlet/1.14"
            },
            {
                "path": "mvn:commons-dbcp/commons-dbcp/1.4"
            },
            {
                "path": "mvn:org.slf4j/slf4j-api/1.7.5"
            },
            {
                "path": "mvn:javax.servlet.jsp/jsp-api/2.1"
            },
            {
                "path": "mvn:com.codahale.metrics/metrics-jvm/3.0.2"
            },
            {
                "path": "mvn:com.thoughtworks.paranamer/paranamer/2.3"
            },
            {
                "path": "mvn:tomcat/jasper-runtime/5.5.23"
            },
            {
                "path": "mvn:com.fasterxml.jackson.core/jackson-databind/2.2.2"
            },
            {
                "path": "mvn:asm/asm-tree/3.1"
            },
            {
                "path": "mvn:com.codahale.metrics/metrics-core/3.0.2"
            },
            {
                "path": "mvn:com.sun.jersey/jersey-core/1.14"
            },
            {
                "path": "mvn:org.apache.hive/hive-service/1.1.0-cdh5.13.1"
            },
            {
                "path": "mvn:org.jamon/jamon-runtime/2.3.1"
            },
            {
                "path": "mvn:com.sun.jersey/jersey-server/1.14"
            },
            {
                "path": "mvn:org.apache.commons/commons-lang3/3.1"
            },
            {
                "path": "mvn:com.codahale.metrics/metrics-json/3.0.2"
            },
            {
                "path": "mvn:org.apache.hive/hive-common/1.1.0-cdh5.13.1"
            },
            {
                "path": "mvn:org.apache.curator/curator-client/2.6.0"
            },
            {
                "path": "mvn:org.apache.thrift/libfb303/0.9.3"
            },
            {
                "path": "mvn:org.apache.thrift/libthrift/0.9.3"
            },
            {
                "path": "mvn:org.apache.geronimo.specs/geronimo-annotation_1.0_spec/1.1.1"
            },
            {
                "path": "mvn:net.sf.opencsv/opencsv/2.3"
            },
            {
                "path": "mvn:org.apache.geronimo.specs/geronimo-jaspic_1.0_spec/1.0"
            },
            {
                "path": "mvn:commons-lang/commons-lang/2.6"
            },
            {
                "path": "mvn:com.fasterxml.jackson.core/jackson-core/2.2.2"
            },
            {
                "path": "mvn:javax.mail/mail/1.4.1"
            },
            {
                "path": "mvn:javax.activation/activation/1.1"
            },
            {
                "path": "mvn:org.tukaani/xz/1.0"
            },
            {
                "path": "mvn:com.jolbox/bonecp/0.8.0.RELEASE"
            },
            {
                "path": "mvn:org.apache.httpcomponents/httpcore/4.2.5"
            },
            {
                "path": "mvn:org.apache.hive/hive-serde/1.1.0-cdh5.13.1"
            },
            {
                "path": "mvn:commons-cli/commons-cli/1.2"
            },
            {
                "path": "mvn:com.google.guava/guava/14.0.1"
            },
            {
                "path": "mvn:org.apache.geronimo.specs/geronimo-jta_1.1_spec/1.1.1"
            },
            {
                "path": "mvn:org.apache.httpcomponents/httpclient/4.2.5"
            },
            {
                "path": "mvn:commons-codec/commons-codec/1.4"
            },
            {
                "path": "mvn:log4j/log4j/1.2.16"
            },
            {
                "path": "mvn:org.apache.ant/ant/1.9.1"
            },
            {
                "path": "mvn:org.datanucleus/datanucleus-rdbms/3.2.9"
            },
            {
                "path": "mvn:javax.transaction/jta/1.1"
            },
            {
                "path": "mvn:commons-logging/commons-logging/1.1.3"
            },
            {
                "path": "mvn:log4j/apache-log4j-extras/1.2.17"
            },
            {
                "path": "mvn:javax.servlet/servlet-api/2.5"
            },
            {
                "path": "mvn:org.apache.ant/ant-launcher/1.9.1"
            },
            {
                "path": "mvn:net.sf.jpam/jpam/1.1"
            },
            {
                "path": "mvn:org.codehaus.jackson/jackson-core-asl/1.9.2"
            },
            {
                "path": "mvn:org.datanucleus/datanucleus-api-jdo/3.2.6"
            },
            {
                "path": "mvn:org.apache.hive.shims/hive-shims-common/1.1.0-cdh5.13.1"
            },
            {
                "path": "mvn:javax.jdo/jdo-api/3.0.1"
            },
            {
                "path": "mvn:org.xerial.snappy/snappy-java/1.0.4.1"
            },
            {
                "path": "mvn:org.apache.curator/curator-framework/2.6.0"
            },
            {
                "path": "mvn:asm/asm/3.2"
            },
            {
                "path": "mvn:org.apache.hive/hive-jdbc/1.1.0-cdh5.13.1"
            },
            {
                "path": "mvn:org.apache.hadoop:hadoop-yarn-server-resourcemanager:2.6.0"
            },
            {
                "path": "mvn:org.apache.hadoop:hadoop-yarn-server-applicationhistoryservice:2.6.0"
            },
            {
                "path": "mvn:org.apache.hadoop/hadoop-annotations/2.6.0"
            },
            {
                "path": "mvn:org.apache.hadoop/hadoop-yarn-common/2.6.0"
            },
            {
                "path": "mvn:org.apache.hadoop/hadoop-yarn-api/2.6.0"
            },
            {
                "path": "mvn:org.apache.hadoop/hadoop-yarn-server-common/2.6.0"
            },
            {
                "path": "mvn:org.apache.hadoop/hadoop-yarn-server-web-proxy/2.6.0"
            },
            {
                "path": "mvn:org.apache.hadoop/hadoop-common/2.7.1"
            },
            {
                "path": "mvn:org.apache.hadoop/hadoop-auth/2.7.1"
            },
            {
                "path": "mvn:org.apache.hadoop/hadoop-core/1.2.1"
            },
            {
                "path": "mvn:org.apache.hive.shims/hive-shims-0.23/1.1.0"
            }
        ]
    }
  4. Open <components_catalog_path>/start.sh and add the following to the system properties: javax.security.auth.useSubjectCredsOnly=false.
    THE_CMD="$JAVA_BIN $JAVA_OPTS -
    Djavax.security.auth.useSubjectCredsOnly=false -cp
    \"$APP_CLASSPATH\" $APP_CLASS $*"
  5. Open <components_catalog_path>/config/application.properties and configure the krb5.config to point to the location where the Components Catalog server is installed:
    krb5.config=/etc/krb5.conf
  6. Create a file named sun.conf in /config/org/talend/daikon/sandbox/properties/.
    This file is needed to allow the Hive component to access specific system properties.
    Warning: If the directories in the path org/talend/daikon/sandbox/properties/ do not exist in <components_catalog_path>/config, create them.
  7. Add the following content in sun.conf:
    #
    # This file contains all Sun/Oracle specific system properties
    #
    java.runtime.name
    sun.boot.library.path
    java.vm.version
    java.vm.vendor
    java.vendor.url
    path.separator
    java.vm.name
    file.encoding.pkg
    sun.java.launcher
    user.country
    sun.os.patch.level
    java.vm.specification.name
    user.dir
    java.runtime.version
    java.awt.graphicsenv
    java.endorsed.dirs
    os.arch
    java.io.tmpdir
    line.separator
    java.vm.specification.vendor
    os.name
    sun.jnu.encoding
    java.library.path
    java.specification.name
    java.class.version
    sun.management.compiler
    os.version
    user.home
    user.timezone
    java.awt.printerjob
    idea.launcher.bin.path
    file.encoding
    java.specification.version
    java.class.path
    user.name
    java.vm.specification.version
    sun.java.command
    java.home
    sun.arch.data.model
    user.language
    java.specification.vendor
    java.vm.info
    java.version
    java.ext.dirs
    sun.boot.class.path
    sun.java.command
    java.home
    sun.arch.data.model
    user.language
    java.specification.vendor
    java.vm.info
    java.version
    java.ext.dirs
    sun.boot.class.path
    java.vendor
    file.separator
    java.vendor.url.bug
    sun.io.unicode.encoding
    sun.cpu.endian
    sun.desktop
    sun.cpu.isalist
    java.security.krb5.conf
    sun.security.krb5.debug
    java.security.krb5.kdc
    java.security.krb5.realm
    java.security.auth.login.config
    javax.security.auth.useSubjectCredsOnly
  8. Restart the Components Catalog service.

Results

In Talend Data Preparation, the Hive database is now available in the database dataset import form, in the Database type drop-down list.

Although the Username and Password fields are marked as mandatory, you can skip them in this case since the authentication is performed using Kerberos.

When exporting a preparation made on data stored on your Hive database, you can choose to process the data on the Talend Data Preparation server.

For more information on how to import data from a database, see Adding a dataset from a database.