It is recommended to activate the Spark logging and checkpointing system in the
Spark configuration tab of the Run
view of your Spark Job, in order to help debug and resume your Spark Job when issues
arise.
Procedure
-
If you need the Job to be resilient to failure, select the Activate checkpointing check box to enable the
Spark checkpointing operation. In the field that is displayed, enter the
directory in which Spark stores, in the file system of the cluster, the context
data of the computations such as the metadata and the generated RDDs of this
computation.
-
In the Yarn client mode, you can
enable the Spark application logs of this Job to be persistent in the file
system. To do this, select the Enable Spark event
logging check box.
The parameters relevant to Spark logs are displayed:
-
Spark event logs directory:
enter the directory in which Spark events are logged. This is actually
the spark.eventLog.dir property.
-
Spark history server address:
enter the location of the history server. This is actually the spark.yarn.historyServer.address
property.
-
Compress Spark event logs: if
needs be, select this check box to compress the logs. This is actually
the spark.eventLog.compress property.
Since the administrator of your cluster could have
defined these properties in the cluster configuration files, it is recommended
to contact the administrator for the exact values.