HBase/MapR-DB Job cannot successfully run with MapR 5.1 or 5.2

EnrichVersion
6.4
EnrichProdName
Talend Real-Time Big Data Platform
Talend Data Fabric
Talend Big Data Platform
Talend Big Data
task
Data Governance > Third-party systems > Database components > MapRDB components
Design and Development > Third-party systems > Database components > HBase components
Data Quality and Preparation > Third-party systems > Database components > MapRDB components
Data Quality and Preparation > Third-party systems > Database components > HBase components
Design and Development > Third-party systems > Database components > MapRDB components
Data Governance > Third-party systems > Database components > HBase components
EnrichPlatform
Talend Studio

HBase/MapR-DB Job cannot successfully run with MapR 5.1 or 5.2

When using a MapR 5.1 or 5.2 distribution with Talend HBase or MapR-DB components on Spark, you may get the following error message in the execution log:

Error occurred while instantiating com.mapr.fs.hbase.MapRTableMappingRules.
            ==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules.
            at test3.testspark_0_1.TestSpark$row12StructInputFormat_tHBaseInput_1.configure(TestSpark.java:791)
            ...
            Caused by: java.io.IOException: java.lang.RuntimeException: Error occurred while instantiating com.mapr.fs.hbase.MapRTableMappingRules.
            ==> org/apache/hadoop/hbase/client/mapr/BaseTableMappingRules.
            at org.apache.hadoop.hbase.client.mapr.TableMappingRulesFactory.create(TableMappingRulesFactory.java:68)
            at org.apache.hadoop.hbase.client.HTable.initIfMapRTableImpl(HTable.java:475)

The reason for this error: HBase classpath is missing in the spark.hadoop.yarn.application.classpath parameter that the Talend Spark Job is using by default.

Environment:

  • Subscription-based Talend solution with Big Data

  • Spark Job running in YARN mode

  • Spark Job running with HBase or MapR-DB components (MapR-DB components on Spark are available only in Studio version 6.3.1 and above)

  • Hadoop distribution MapR 5.1 or 5.2

Adding HBase classpath

You must add the related HBase classpath to the spark.hadoop.yarn.application.classpath parameter.

This parameter is always overridden by the default equivalent of a Talend Spark Job. To make the Job take your update into account, you must update the parameter directly on the Spark configuration tab of the Job, not in your cluster.

This procedure explains only how to add the HBase classpath to a Spark Job; it doesn't show any other configuration required by the Job.

Procedure

  1. In the Run view of the Job, click the Spark configuration tab.
  2. In the Advanced properties table, in the Property column, within double quotation marks, add spark.hadoop.yarn.application.classpath.
  3. Copy all the paths defined with the spark.hadoop.yarn.application.classpath parameter in your cluster. Paste them in the Value column.
  4. Add the classpath of your HBase to these paths, separating each path with a comma.

    For example, if the HBase version installed in the cluster is 1.1.1, add opt/mapr/hbase/hbase-1.1.1/lib/*,/opt/mapr/lib/*. These example paths are where HBase is usually installed in a MapR cluster. If your HBase is installed elsewhere, contact the administrator of your cluster for details and adapt these paths accordingly.