Hive-on-Tez issue in Spark Jobs when using Hortonworks
The Hive configuration in a Hortonworks cluster is specific. The configuration uses Tez as Hive engine and can lead to a known issue when running Talend Studio Hive components in Spark Jobs.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:62) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:552) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:307) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:321) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:451) at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.tRowGenerator_1Process(test_hdp26_hive.java:1152) at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.run(test_hdp26_hive.java:1597) at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.runJobInTOS(test_hdp26_hive.java:1387) at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.main(test_hdp26_hive.java:1272) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980) ... 11 more Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:176) at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101) at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100) at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157) at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32) ... 16 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:173) ... 24 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262) at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65) ... 29 more Caused by: java.lang.RuntimeException: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:535) at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188) ... 37 more Caused by: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration at org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:166) at org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:831) at org.apache.tez.client.TezClient.start(TezClient.java:355) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:184) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:532) ... 38 more
This known issue demonstrated above happens during Hive initialization by Spark, during which Hive looks for a Tez configuration as Tez is the designated execution engine by Hortonworks. However, as Spark does not expect Tez, this configuration should be overridden:
Environment:
-
Subscription-based Talend Studio solution with Big Data
-
Spark Jobs
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!