The HDP version variable issue in MapReduce Jobs and Spark Jobs

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data Platform
Talend Big Data
Talend Real-Time Big Data Platform
Talend Data Fabric
task
Design and Development > Designing Jobs > Job Frameworks > MapReduce
Design and Development > Designing Jobs > Hadoop distributions > Hortonworks
Design and Development > Designing Jobs > Job Frameworks > Spark Streaming
Design and Development > Designing Jobs > Job Frameworks > Spark Batch
EnrichPlatform
Talend Studio

Set up the hdp.version parameter to resolve the Hortonworks version issue

Hortonworks relies on the hdp.version environment variable in its configuration files to support its rolling upgrades. This can lead to a known issue when running Spark or MapReduce jobs in Talend Studio.

The Studio retrieves these Hortonworks configuration files along with this hdp.version variable from a Hortonworks cluster. When you define the Hadoop connection in the Studio, the Studio generates a hadoop-conf-xxx.jar using these configuration files and adds this JAR file, thus along with this variable, to the classpath of your Job. This may lead to the following known issue:

[ERROR]: org.apache.spark.SparkContext - Error initializing SparkContext.
            org.apache.spark.SparkException: 
            hdp.version is not found,
            Please set HDP_VERSION=xxx in spark-env.sh,
            or set -Dhdp.version=xxx in spark.{driver|yarn.am}.extraJavaOptions
            or set SPARK_JAVA_OPTS="-Dhdp.verion=xxx" in spark-env.sh
            If you're running Spark under HDP.
            
            at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:999)
            at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
            at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
            at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
            at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
            at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
                    at dev_v6_001.test_hdp_1057_0_1.test_hdp_1057.runJobInTOS(test_hdp_1057.java:1454)
                    at dev_v6_001.test_hdp_1057_0_1.test_hdp_1057.main(test_hdp_1057.java:1341)
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
            
            Diagnostics: Exception from container-launch.
            Container id: container_1496650120478_0011_02_000001
            Exit code: 1
            Exception message: /hadoop/yarn/local/usercache/abbass/appcache/application_1496650120478_0011/container_1496650120478_0011_02_000001/launch_container.sh: line 21: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/current/hadoop-mapreduce-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
java.lang.IllegalArgumentException: Unable to parse '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, check the setting for mapreduce.application.framework.path
            at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:443)
            at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
            at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
            at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
            at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
            at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
            at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
            at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
            at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
            at org.talend.hadoop.mapred.lib.MRJobClient.runJob(MRJobClient.java:46)
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.runMRJob(test_mr_hdp26.java:1556)
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.access$2(test_mr_hdp26.java:1546)
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26$1.run(test_mr_hdp26.java:1194)
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26$1.run(test_mr_hdp26.java:1)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.tRowGenerator_1Process(test_mr_hdp26.java:1044)
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.run(test_mr_hdp26.java:1524)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.runJobInTOS(test_mr_hdp26.java:1483)
            at dev_v6_001.test_mr_hdp26_0_1.test_mr_hdp26.main(test_mr_hdp26.java:1431)
            Caused by: java.net.URISyntaxException: Illegal character in path at index 11: /hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework
            at java.net.URI$Parser.fail(URI.java:2848)
            at java.net.URI$Parser.checkChars(URI.java:3021)
            at java.net.URI$Parser.parseHierarchical(URI.java:3105)
            at java.net.URI$Parser.parse(URI.java:3063)
            at java.net.URI.<init>(URI.java:588)
                at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:441)
                ... 27 more

These error messages can appear together or separately.

Environment:

  • Subscription-based Talend Studio solution with Big Data

  • Spark or MapReduce Jobs

Find the hdp.version value to be used

Procedure

Identify the version of your Hortonwork cluster.
  • You may directly ask the administrator of your cluster about the correct version to use.

  • You can also check the /usr/hdp/ directory of each machine in your cluster. This folder usually contains several versions and a symbolic link reading current points to the latest value you should use. For example:
    [root@sandbox /]# ls  -lth /usr/hdp/
    total 16K
    drwxr-xr-x 50 root root 4.0K Jun  5 07:59 2.6.0.3-8
    drwxr-xr-x 32 root root 4.0K May  5 13:19 2.5.0.0-1245
    drwxr-xr-x  3 root root 4.0K May  5 13:18 share
    drwxr-xr-x  2 root root 4.0K May  5 12:48 current -> 2.6.0.3-8

    In this example, the version to be used is hdp.version=2.6.0.3-8.

Results

Then you need to configure the Talend Studio Job to be used.
  1. For Spark Jobs, see Resolve the hdp.version variable issue for Spark Jobs.

  2. For MapReduce Jobs, see Resolve the hdp.version variable issue for MapReduce Jobs.

Resolve the hdp.version variable issue for Spark Jobs

Procedure

  1. Define the hdp.version parameter in your cluster.

    The easiest way of doing this is to add this parameter to the yarn-site.xml configuration file.

    1. In Ambari, click the Yarn service on the service list on the left, then click Configs to open the configuration page and click the Advanced tab.
    2. Scroll down the page to find the Custom yarn-site list at the end of page and click Custom yarn-site to show this list.
    3. Click Add property to open the [Add property] dialog box.
    4. Enter hdp.version=2.6.0.3-8, the version number you found by following Find the hdp.version value to be used and click Add to validate the changes. The hdp.version parameter appears in the Custom yarn-site parameter list.
    5. Click Save to validate the new configuration and restart the services to implement the hdp.version parameter in the yarn-site.xml file.
  2. In the Studio, open the Spark Job to be used and click the Run tab to open its view.
  3. Click Spark configuration, then in the view, select the Set hdp.version check box and enter, within double quotation marks, the same version number you have entered in the cluster. In this example, it is 2.6.0.3-8.

    This procedure explains only the actions to be performed to solve the HDP version issue for a Spark Job. You need properly configure the other parts of your Job before being able to run it successfully.

Resolve the hdp.version variable issue for MapReduce Jobs

Procedure

  1. Define the hdp.version parameter in your cluster, to be exact, in the mapred-site.xml file of the cluster. The Hortonworks cluster reads this file to find the MapReduce application to be used.
    1. In Ambari, click the MapReduce2 service on the service list on the left, then click Configs to open the configuration page and click the Advanced tab.
    2. Scroll down the page to find the Advanced mapred-site list at the end of page and click Advanced mapred-site to show this list.
    3. Find the mapreduce.application.framework.path parameter. A path has been defined for this parameter with the ${hdp.version} variable present in this path.
    4. Replace ${hdp.version} with hdp.version=2.6.0.3-8, the version number you found by following Find the hdp.version value to be used.
    5. Click Save to validate the new configuration and restart the services to implement the new hdp.version value in the mapred-site.xml file.
  2. In the Studio, open the MapReduce Job to be used and click the Run tab to open its view.
  3. Click Advanced settings, then in the view, select the Use specific JVM arguments check box and add the same version number you have entered in the cluster. In this example, add -Dhdp.version=2.6.0.3-8.

    This procedure explains only the actions to be performed to solve the HDP version issue for a MapReduce Job. You need properly configure the other parts of your Job before being able to run it successfully.