Complete the HD Insight connection configuration in the Spark configuration tab of the Run view of your Job. This configuration is effective on a per-Job basis.
Only the Yarn client mode is available for this type of cluster.
Enter the basic connection information to Microsoft HD Insight:
The Hostname of Livy uses the following syntax: your_hdinsight_cluster_name.azurehdinsight.net. For further information about the Livy service used by HD Insight, see Submit Spark jobs using Livy.
Enter the authentication information of the HD Insight cluster to be used.
Windows Azure Storage configuration
Enter the address and the authentication information of the Azure Storage account to be used. In this configuration, you do not define where to read or write your business data but define where to deploy your Job only. Therefore always use the Azure Blob Storage system for this configuration.
In the Container field, enter the name of the container to be used.
In the Deployment Blob field, enter the location in which you want to store the current Job and its dependent libraries in this Azure Storage account.
- With the Yarn client mode, the Property type list is displayed to allow you to select an established Hadoop connection from the Repository, on the condition that you have created this connection in the Repository. Then the Studio will reuse that set of connection information for this Job.
- In the Spark "scratch" directory field, enter the directory in which the Studio stores in the local system the temporary files such as the jar files to be transferred. If you launch the Job on Windows, the default disk is C:. So if you leave /tmp in this field, this directory is C:/tmp.
- Select the Wait for the Job to complete check box to make your Studio or, if you use Talend Jobserver, your Job JVM keep monitoring the Job until the execution of the Job is over. By selecting this check box, you actually set the spark.yarn.submit.waitAppCompletion property to be true. While it is generally useful to select this check box when running a Spark Batch Job, it makes more sense to keep this check box clear when running a Spark Streaming Job.
It is recommended to activate the Spark logging and checkpointing system in the Spark configuration tab of the Run view of your Spark Job, in order to help debug and resume your Spark Job when issues arise: