HBase/MapR-DB Job cannot successfully run with MapR 5.1 or 5.2
When using a MapR 5.1 or 5.2 distribution with Talend HBase or MapR-DB components on Spark, you may get the following error message in the execution log:
Error occurred while instantiating com.mapr.fs.hbase.MapRTableMappingRules. ==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules. at test3.testspark_0_1.TestSpark$row12StructInputFormat_tHBaseInput_1.configure(TestSpark.java:791) ... Caused by: java.io.IOException: java.lang.RuntimeException: Error occurred while instantiating com.mapr.fs.hbase.MapRTableMappingRules. ==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules. at org.apache.hadoop.hbase.client.mapr.TableMappingRulesFactory.create(TableMappingRulesFactory.java:68) at org.apache.hadoop.hbase.client.HTable.initIfMapRTableImpl(HTable.java:475)
The reason for this error: HBase classpath is missing in the spark.hadoop.yarn.application.classpath parameter that the Talend Spark Job is using by default.
Environment:
Subscription-based Talend solution with Big Data
Spark Job running in YARN mode
Spark Job running with HBase or MapR-DB components (MapR-DB components on Spark are available only in Studio version 6.3.1 and above)
Hadoop distribution MapR 5.1 or 5.2
Adding HBase classpath
You must add the related HBase classpath to the spark.hadoop.yarn.application.classpath parameter.
This parameter is always overridden by the default equivalent of a Talend Spark Job. To make the Job take your update into account, you must update the parameter directly on the Spark configuration tab of the Job, not in your cluster.
This procedure explains only how to add the HBase classpath to a Spark Job; it doesn't show any other configuration required by the Job.