Use a Qubole distribution in Talend Studio
A Talend Studio from V7.0 onwards.
A Talend JobServer from V7.0 onwards (same version as the Studio).
- The Qubole distribution zip file downloaded from Talend Exchange.
Configure the JobServer
Configure the JobServer to be able to run your Job remotely in this JobServer.
A Talend JobServer is available only in a subscription-based Talend solution. If you are using a community solution, skip this section.
Procedure
Results
The JobServer is now ready to be used to run your Job remotely.
Define the Qubole connection
Procedure
Results
The Qubole conneciton is now ready to be used, for example, in a Talend Job.
Use the Qubole connection in a Spark Job
Use the Qubole connection previously defined in the Repository in a Talend Job.
In this example, a Job for Apache Spark is used. This type of Jobs are available only to a subscription-based Talend solution with Big Data. If you are using a community version of the Studio, you can create a Standard Job (a traditional Data Integration Job), for example with the HDFS components, to use this Qubole connection.
Procedure
Results
The Qubole relevant configuration for your Job is done. Once you finish developing your Job, you can run it.
Qubole support matrix
The following table presents the supported and unsupported items of the Qubole configuration zip.
The term "supported" means Talend went through a complete QA validation process.
Supported |
Unsupported |
|
---|---|---|
On the Qubole cluster side |
|
|
On the Studio side |
|
|
Known issue: a SPARK_HOME issue
When running a Spark Job, you may encounter the following issue.
[ERROR]: org.apache.spark.SparkContext - Error initializing SparkContext.
java.util.NoSuchElementException: key not found: SPARK_HOME
Install a spark-client in the machine where this Job is executed to resolve this issue. This machine is typically the one the JobServer has been installed.