Defining the HD Insight connection parameters - 6.5

Spark Batch

Talend Documentation Team
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Design and Development > Designing Jobs > Job Frameworks > Spark Batch
Talend Studio

Complete the HD Insight connnection configuration in the Spark configuration tab of the Run view of your Job. This configuration is effective on a per-Job basis.

Only the Yarn cluster mode is available for this type of cluster.

The information in this section is only for users who have subscribed to Talend Data Fabric or to any Talend product with Big Data, but it is not applicable to Talend Open Studio for Big Data users.


  1. Enter the basic connection information to Microsoft HD Insight:

    Livy configuration

    The Hostname of Livy uses the following syntax: For further information about the Livy service used by HD Insight, see Submit Spark jobs using Livy.

    HDInsight configuration

    Enter the authentication information of the HD Insight cluster to be used.

    Windows Azure Storage configuration

    Enter the address and the authentication information of the Azure Storage account to be used.

    In the Container field, enter the name of the container to be used.

    In the Deployment Blob field, enter the location in which you want to store the current Job and its dependent libraries in this Azure Storage account.

  2. In the Spark "scratch" directory field, enter the directory in which the Studio stores in the local system the temporary files such as the jar files to be transferred. If you launch the Job on Windows, the default disk is C:. So if you leave /tmp in this field, this directory is C:/tmp.


After the connection is configured, you can tune the Spark performance, although not required, by following the process explained in: