Use the Hadoop property filter of Talend Studio to resolve the Hive-on-Tez issue for Spark Jobs on Hortonworks

You need to use the original hive-site.xml file of your Hortonworks cluster or you do not have access to the Spark specific configuration files, you can use the property filter provided in the Hadoop metadata wizard in Talend Studio to solve this issue.

Procedure

Define the Hadoop connection to your Hortonworks cluster in the Repository if you have not done so.
Right-click this connection and from the contextual menu, select Edit Hadoop cluster to open the Hadoop cluster connection wizard.
Click Next to open the second step of this wizard and select the Use custom Hadoop configurations check box.
Click the [...] button next to Use custom Hadoop configurations to open the Hadoop configuration import wizard.
Select the Hortonworks version you are using and then perform one of the following operations:
- If your Hortonworks has Ambari installed, select the Retrieve configuration from Ambari or Cloudera radio button and click Next. Then do the following:
  1. In the wizard that is opened, enter the Ambari credentials in the corresponding fields and click Connect.
    
    Then a cluster name is displayed on the Discovered clusters drop-down list.
  2. On the list, select your cluster and click Fetch to retrieve the configuration of the related services.
  3. Click the [...] button next to Hadoop property filter to open the wizard.
- If your Hortonworks does not have Ambari, you have to import the Hive configuration files from a local directory. This means you need to contact the administrator of your cluster to obtain the Hive configuration files or download these files yourself.
  
  Once you have these files, do the following:
  1. In Hadoop configuration import wizard, select the Import configuration from local files radio button and click Next.
  2. Click Browse... to find the Hive configuration files.
  3. Click the [...] button next to Hadoop property filter to open the wizard.
Click the [+] button to add one row and enter hive.execution.engine in this new row to filter this property out.
Click OK to validate this addition to go back to Hadoop configuration import wizard.
Click Finish to close the import wizard and thus finish the import to go back to the Hadoop cluster connection wizard.
Click Finish to validate the changes and in the pop-up dialog box, click Yes to accept the propagation. Then the wizard is closed and the Spark-specific Hive configuration file is going to be used along with this Hadoop connection.

This new configuration is effective only for the Jobs that use this connection.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here