Adding advanced Spark properties to solve issues

Spark Batch

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Real-Time Big Data Platform
Talend Big Data
Talend Data Fabric
Talend Big Data Platform
task
Design and Development > Designing Jobs > Job Frameworks > Spark Batch
EnrichPlatform
Talend Studio

Depending on the distribution you are using or the issues you encounter, you may need to add specific Spark properties to the Advanced properties table in the Spark configuration tab of the Run view of your Job.

The information in this section is only for users who have subscribed to Talend Data Fabric or to any Talend product with Big Data, but it is not applicable to Talend Open Studio for Big Data users.

The advanced properties required by different Hadoop distributions and their values are listed below:

For further information about the valid Spark properties, see Spark documentation at https://spark.apache.org/docs/latest/configuration.

Hortonworks Data Platform V2.4

  • spark.yarn.am.extraJavaOptions: -Dhdp.version=2.4.0.0-169

  • spark.driver.extraJavaOptions: -Dhdp.version=2.4.0.0-169

In addition, you need to add -Dhdp.version=2.4.0.0-169 to the JVM settings area either in the Advanced settings tab of the Run view or in the Talend > Run/Debug view of the [Preferences] window. Setting this argument in the [Preferences] window applies it on all the Jobs that are designed in the same Studio.

MapR V5.1 and V5.2

When the cluster is used with the HBase or the MapRDB components:

spark.hadoop.yarn.application.classpath: enter the value of this parameter specific to your cluster and add, if missing, the classpath to HBase to ensure that the Job to be used can find the required classes and packages in the cluster.

For example, if the HBase version installed in the cluster is 1.1.1, copy and paste all the paths defined with the spark.hadoop.yarn.application.classpath parameter from your cluster and then add opt/mapr/hbase/hbase-1.1.1/lib/* and /opt/mapr/lib/* to these paths, separating each path with a comma(,). The added paths is where HBase is usually installed in a MapR cluster. If your HBase is installed elsewhere, contact the administrator of your cluster for details and adapt these paths accordingly.

For a step-by-step explanation about how to add this parameter, see HBase/MapR-DB Jobs cannot run successfully with MapR.