Configuring and running your Spark Job with CDP Public Cloud Data Hub on AWS - 7.3


Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Talend Studio
Design and Development > Designing Jobs > Hadoop distributions > Cloudera

Talend Studio allows you to deploy and execute your Spark Streaming and Spark Batch Jobs on a remote JobServer with a CDP Public Cloud Data Hub on AWS instance.

Before you begin

Make sure:


  1. Connect to your Cloudera Management console and go to the Data Hub Clusters tab and then Hardware tab.
  2. Make sure you have a gateway host available under the Gateway section. If no gateway is available, you must create a new one.
  3. Download your JobServer to install it on the gateway.
  4. Connect to your AWS Management Console and from the VPC Management Console, make sure that the ports in the Inbound rules and Outbound rules tabs that you set up for the JobServer are open.
  5. Connect to Cloudera Manager and from the Clusters tab, download all the configuration files from your cluster and unzip them all in the same path on your local machine.
  6. Connect to Talend Studio and set up manually the Hadoop connection using the Import configuration from local files option. For more information, see the third step in Setting up the Hadoop connection.
    • You do not have to select any Cloudera version in the drop-down list. As Talend Studio uses the configuration files from the CDP Public Cloud instance clusters, it will use the runtime version defined in it.
    • You must enable SSL and Kerberos.
  7. Run your Job on the JobServer. For more information, see Running a Job remotely.


You are now able to use a CDP Public Cloud Data Hub on AWS instance with Talend Studio.