Talend Studio
allows you to deploy and execute your Spark Streaming and Spark Batch Jobs on a remote
JobServer with a CDP Public Cloud Data Hub on AWS instance.
Before you begin
Make sure:
Procedure
-
Connect to your Cloudera Management console and go to the Data Hub
Clusters tab and then Hardware tab.
-
Make sure you have a gateway host available under the
Gateway section. If no gateway is available, you must
create a new one.
-
Download your JobServer to install it on the gateway.
-
Connect to your AWS Management Console and from the VPC Management
Console, make sure that the ports in the Inbound
rules and Outbound rules tabs that you set up
for the JobServer are open.
-
Connect to Cloudera Manager and from the
Clusters tab, download all the configuration files from
your cluster and unzip them all in the same path on your local machine.
-
Connect to Talend Studio and
set up manually the Hadoop connection using the Import configuration from
local files option. For more information, see the third step in Setting up the Hadoop connection.
Note:
- You do not have to select any Cloudera version in the drop-down list. As
Talend Studio uses the configuration files from the CDP Public Cloud instance clusters,
it will use the runtime version defined in it.
- You must enable SSL and Kerberos.
-
Run your Job on the JobServer. For more information, see
Running a Job remotely.
Results
You are now able to use a CDP Public Cloud Data Hub on AWS
instance with
Talend Studio.