Speeding up Job execution with Apache Spark on YARN
Every time when a Spark Job is launched, its dependences are automatically transferred to the YARN cluster in which this Job is executed. Manually upload these dependences to avoid this time-consuming transfer and thus shorten the execution time of the Spark Job.
The procedure explained in this article is applied to the Talend Jobs running on Spark 2.0 onwards only. If you are using a Spark version prior to 2.0, see Upload the assembly file.
Uploading the dependencies and specifying the path
Upload the Spark dependencies to YARN and specify the
spark.yarn.jars property.