Use the Hadoop property filter of Talend Studio to resolve the Hive-on-Tez issue for Spark Jobs on Hortonworks - 8.0

The Hive-on-Tez issue with Hortonworks in Spark Jobs

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs > Hadoop distributions > Hortonworks
Design and Development > Designing Jobs > Job Frameworks > Spark Batch
Design and Development > Designing Jobs > Job Frameworks > Spark Streaming
Last publication date
2024-02-06
You need to use the original hive-site.xml file of your Hortonworks cluster or you do not have access to the Spark specific configuration files, you can use the property filter provided in the Hadoop metadata wizard in Talend Studio to solve this issue.

Procedure

  1. Define the Hadoop connection to your Hortonworks cluster in the Repository if you have not done so.
  2. Right-click this connection and from the contextual menu, select Edit Hadoop cluster to open the Hadoop cluster connection wizard.
  3. Click Next to open the second step of this wizard and select the Use custom Hadoop configurations check box.
  4. Click the [...] button next to Use custom Hadoop configurations to open the Hadoop configuration import wizard.
  5. Select the Hortonworks version you are using and then perform one of the following operations:
    • If your Hortonworks has Ambari installed, select the Retrieve configuration from Ambari or Cloudera radio button and click Next. Then do the following:
      1. In the wizard that is opened, enter the Ambari credentials in the corresponding fields and click Connect.

        Then a cluster name is displayed on the Discovered clusters drop-down list.

      2. On the list, select your cluster and click Fetch to retrieve the configuration of the related services.

      3. Click the [...] button next to Hadoop property filter to open the wizard.

    • If your Hortonworks does not have Ambari, you have to import the Hive configuration files from a local directory. This means you need to contact the administrator of your cluster to obtain the Hive configuration files or download these files yourself.

      Once you have these files, do the following:

      1. In Hadoop configuration import wizard, select the Import configuration from local files radio button and click Next.

      2. Click Browse... to find the Hive configuration files.

      3. Click the [...] button next to Hadoop property filter to open the wizard.

  6. Click the [+] button to add one row and enter hive.execution.engine in this new row to filter this property out.
  7. Click OK to validate this addition to go back to Hadoop configuration import wizard.
  8. Click Finish to close the import wizard and thus finish the import to go back to the Hadoop cluster connection wizard.
  9. Click Finish to validate the changes and in the pop-up dialog box, click Yes to accept the propagation. Then the wizard is closed and the Spark-specific Hive configuration file is going to be used along with this Hadoop connection.

    This new configuration is effective only for the Jobs that use this connection.