Skip to main content

Hive-on-Tez issue in Spark Jobs when using Hortonworks

The Hive configuration in a Hortonworks cluster is specific. The configuration uses Tez as Hive engine and can lead to a known issue when running Talend Studio Hive components in Spark Jobs.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:62)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:552)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:307)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:321)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:451)
at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.tRowGenerator_1Process(test_hdp26_hive.java:1152)
at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.run(test_hdp26_hive.java:1597)
at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.runJobInTOS(test_hdp26_hive.java:1387)
at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.main(test_hdp26_hive.java:1272)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980)
... 11 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:176)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
... 16 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:173)
... 24 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65)
... 29 more
Caused by: java.lang.RuntimeException: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:535)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
... 37 more
Caused by: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration
at org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:166)
at org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:831)
at org.apache.tez.client.TezClient.start(TezClient.java:355)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:184)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:532)
... 38 more

This known issue demonstrated above happens during Hive initialization by Spark, during which Hive looks for a Tez configuration as Tez is the designated execution engine by Hortonworks. However, as Spark does not expect Tez, this configuration should be overridden:

Environment:

  • Subscription-based Talend Studio solution with Big Data

  • Spark Jobs

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!