Use a Spark-specific Hive configuration file to resolve the Hive-on-Tez issue for Spark Jobs on Hortonworks - 8.0

The Hive-on-Tez issue with Hortonworks in Spark Jobs

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs > Hadoop distributions > Hortonworks
Design and Development > Designing Jobs > Job Frameworks > Spark Batch
Design and Development > Designing Jobs > Job Frameworks > Spark Streaming
Last publication date
2024-02-06

Hortonworks ships a Spark-specific hive-site.xml file to resolve this Hive-on-Tez issue. You can use this file to define the connection to your Hortonworks cluster in Talend Studio.

This file is stored in the Spark configuration folder of your Hortonworks cluster: /etc/spark/conf.

Procedure

  1. Obtain this Spark-specific Hive configuration file from the administrator of your cluster.
  2. Download the regular Hive configuration files from your cluster, for example, using Ambari.
  3. Among these files, replace the /etc/hive/conf/hive-site.xml file with this Spark-specific /etc/spark/conf/hive-site.xml file.
  4. Define the Hadoop connection to your Hortonworks cluster in the Repository if you have not done so.
  5. Right-click this connection and from the contextual menu, select Edit Hadoop cluster to open the Hadoop cluster connection wizard.
  6. Click Next to open the second step of this wizard and select the Use custom Hadoop configurations check box.
  7. Click the [...] button next to Use custom Hadoop configurations to open the Hadoop configuration import wizard.
  8. Select the Hortonworks version you are using and then select the Import configuration from local files radio button.
  9. Click Next and click Browse... to find the Hive configuration files among which you placed the Spark-specific hive-site.xml file in one of the previous steps.
  10. Click Finish to close the import wizard and thus finish the import to go back to the Hadoop cluster connection wizard.
  11. Click Finish to validate the changes and in the pop-up dialog box, click Yes to accept the propagation. Then the wizard is closed and the Spark-specific Hive configuration file is going to be used along with this Hadoop connection.

    This new configuration is effective only for the Jobs that use this connection.