The HDP version variable issue in MapReduce Jobs and Spark Jobs - 8.0

English (United States)
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Talend Studio
Design and Development > Designing Jobs > Hadoop distributions > Hortonworks
Design and Development > Designing Jobs > Job Frameworks > MapReduce
Design and Development > Designing Jobs > Job Frameworks > Spark Batch
Design and Development > Designing Jobs > Job Frameworks > Spark Streaming

Set up the hdp.version parameter to resolve the Hortonworks version issue

Hortonworks relies on the hdp.version environment variable in its configuration files to support its rolling upgrades. This can lead to a known issue when running Spark jobs in Talend Studio.

The Studio retrieves these Hortonworks configuration files along with this hdp.version variable from a Hortonworks cluster. When you define the Hadoop connection in the Studio, the Studio generates a hadoop-conf-xxx.jar using these configuration files and adds this JAR file, thus along with this variable, to the classpath of your Job. This may lead to the following known issue:

[ERROR]: org.apache.spark.SparkContext - Error initializing SparkContext.
            hdp.version is not found,
            Please set HDP_VERSION=xxx in,
            or set -Dhdp.version=xxx in spark.{driver|}.extraJavaOptions
            or set SPARK_JAVA_OPTS="-Dhdp.verion=xxx" in
            If you're running Spark under HDP.
            at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:999)
            at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
            at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
            at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
            at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
                    at dev_v6_001.test_hdp_1057_0_1.test_hdp_1057.runJobInTOS(
                    at dev_v6_001.test_hdp_1057_0_1.test_hdp_1057.main(
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
            Diagnostics: Exception from container-launch.
            Container id: container_1496650120478_0011_02_000001
            Exit code: 1
            Exception message: /hadoop/yarn/local/usercache/abbass/appcache/application_1496650120478_0011/container_1496650120478_0011_02_000001/ line 21: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/current/hadoop-mapreduce-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
java.lang.IllegalArgumentException: Unable to parse '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, check the setting for mapreduce.application.framework.path
            at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(
            at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(
            at org.apache.hadoop.mapreduce.Job$
            at org.apache.hadoop.mapreduce.Job$
            at Method)
            at org.apache.hadoop.mapreduce.Job.submit(
            at org.apache.hadoop.mapred.JobClient$
            at org.apache.hadoop.mapred.JobClient$
            at Method)
            at org.apache.hadoop.mapred.JobClient.submitJobInternal(
            at org.apache.hadoop.mapred.JobClient.submitJob(
            at org.talend.hadoop.mapred.lib.MRJobClient.runJob(
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.runMRJob(
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.access$2(
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26$
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26$
            at Method)
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.tRowGenerator_1Process(
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.runJobInTOS(
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.main(
            Caused by: Illegal character in path at index 11: /hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework
                at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(
                ... 27 more

These error messages can appear together or separately.


  • Subscription-based Talend Studio solution with Big Data

  • Spark or MapReduce Jobs

Find the hdp.version value to be used


Identify the version of your Hortonwork cluster.
  • You may directly ask the administrator of your cluster about the correct version to use.

  • You can also check the /usr/hdp/ directory of each machine in your cluster. This folder usually contains several versions and a symbolic link reading current points to the latest value you should use. For example:
    [root@sandbox /]# ls  -lth /usr/hdp/
    total 16K
    drwxr-xr-x 50 root root 4.0K Jun  5 07:59
    drwxr-xr-x 32 root root 4.0K May  5 13:19
    drwxr-xr-x  3 root root 4.0K May  5 13:18 share
    drwxr-xr-x  2 root root 4.0K May  5 12:48 current ->

    In this example, the version to be used is hdp.version=


Then you need to configure the Talend Studio Job to be used.
  1. For Spark Jobs, see Resolve the hdp.version variable issue for Spark Jobs.

  2. For MapReduce Jobs, see resolve-the-hdp.version-variable-issue-for-mapreduce-jobs_t.html.

Resolve the hdp.version variable issue for Spark Jobs


  1. Define the hdp.version parameter in your cluster.

    The easiest way of doing this is to add this parameter to the yarn-site.xml configuration file.

    1. In Ambari, click the Yarn service on the service list on the left, then click Configs to open the configuration page and click the Advanced tab.
    2. Scroll down the page to find the Custom yarn-site list at the end of page and click Custom yarn-site to show this list.
    3. Click Add property to open the [Add property] dialog box.
    4. Enter hdp.version=, the version number you found by following the procedure at the beginning of this article about how to find the hdp.version value to be used, and click Add to validate the changes. The hdp.version parameter appears in the Custom yarn-site parameter list.
    5. Click Save to validate the new configuration and restart the services to implement the hdp.version parameter in the yarn-site.xml file.
  2. In the Studio, open the Spark Job to be used and click the Run tab to open its view.
  3. Click Spark configuration, then in the view, select the Set hdp.version check box and enter, within double quotation marks, the same version number you have entered in the cluster. In this example, it is

    This procedure explains only the actions to be performed to solve the HDP version issue for a Spark Job. You need properly configure the other parts of your Job before being able to run it successfully.