Hive-on-Tez issue in Spark Jobs when using Hortonworks
The Hive configuration in a Hortonworks cluster is specific. The configuration uses Tez as Hive engine and can lead to a known issue when running Talend Studio Hive components in Spark Jobs.
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:62) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:552) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:307) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:321) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:451) at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.tRowGenerator_1Process(test_hdp26_hive.java:1152) at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.run(test_hdp26_hive.java:1597) at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.runJobInTOS(test_hdp26_hive.java:1387) at dev_v6_001.test_hdp26_hive_0_1.test_hdp26_hive.main(test_hdp26_hive.java:1272) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980) ... 11 more Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:176) at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101) at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100) at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157) at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32) ... 16 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:173) ... 24 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262) at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65) ... 29 more Caused by: java.lang.RuntimeException: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:535) at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188) ... 37 more Caused by: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration at org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:166) at org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:831) at org.apache.tez.client.TezClient.start(TezClient.java:355) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:184) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:532) ... 38 more
This known issue demonstrated above happens during Hive initialization by Spark, during which Hive looks for a Tez configuration as Tez is the designated execution engine by Hortonworks. However, as Spark does not expect Tez, this configuration should be overridden:
Environment:
Subscription-based Talend Studio solution with Big Data
Spark Jobs
Use a Spark-specific Hive configuration file to resolve the Hive-on-Tez issue for Spark Jobs on Hortonworks
Hortonworks ships a Spark-specific hive-site.xml file to resolve this Hive-on-Tez issue. You can use this file to define the connection to your Hortonworks cluster in the Studio.
This file is stored in the Spark configuration folder of your Hortonworks cluster: /etc/spark/conf.
Procedure
Use the Hadoop property filter of the Studio to resolve the Hive-on-Tez issue for Spark Jobs on Hortonworks
You need to use the original hive-site.xml file of your Hortonworks cluster or you do not have access to the Spark specific configuration files, you can use the property filter provided in the Hadoop metadata wizard in the Studio to solve this issue.