It is recommended to activate the Spark logging and checkpointing system in the
Spark configuration tab of the Run
view of your Spark Job, in order to help debug and resume your Spark Job when issues
arise.
The information in this section is only for users who have subscribed to
Talend Data Fabric or to any Talend product with Big Data but it is not
applicable to Talend Open Studio for Big Data users.
Procedure
-
If you need the Job to be resilient to failure, select the Activate checkpointing check box to enable the
Spark checkpointing operation. In the field that is displayed, enter the
directory in which Spark stores, in the file system of the cluster, the context
data of the computations such as the metadata and the generated RDDs of this
computation.
-
In the Yarn client mode or the
Yarn cluster mode, you can enable the Spark
application logs of this Job to be persistent in the file system. To do this,
select the Enable Spark event logging check
box.
The parameters relevant to Spark logs are displayed:
-
Spark event logs directory:
enter the directory in which Spark events are logged. This is actually
the spark.eventLog.dir property.
-
Spark history server address:
enter the location of the history server. This is actually the spark.yarn.historyServer.address
property.
-
Compress Spark event logs: if
needs be, select this check box to compress the logs. This is actually
the spark.eventLog.compress property.
Since the administrator of your cluster could have
defined these properties in the cluster configuration files, it is recommended
to contact the administrator for the exact values.
-
If you want to print the Spark context that your Job starts in the log, add the
spark.logConf property in the Advanced
properties table and enter, within double quotation marks,
true in the Value column of
this table.
Since the administrator of your cluster could have
defined these properties in the cluster configuration files, it is recommended
to contact the administrator for the exact values.