Skip to main content Skip to complementary content

Create the cluster metadata in Talend Studio

It is recommended to create a new Hadoop cluster metadata in the Repository. This allows you to easily reuse the connection information to your EMR cluster.

Procedure

  1. In the Import option window, select the options for the Amazon EMR 4.0.0 distribution, then select Enter manually Hadoop services and click Finish.
  2. In the window that pops up, replace localhost and 0.0.0.0 with the private DNS of the master node.
  3. In the User name field, enter hadoop.
  4. Click Check Services to verify that your connection to the cluster.

    If Talend Studio cannot connect to your Amazon EMR cluster, double-check the cluster connection configuration. If the configuration is correct, double-check the hosts file. If this configuration is also correct, then you have to investigate connectivity issues between your EC2 instance hosting Talend Studio and the cluster master node. Issues can arise from the firewall of your Talend Studio instance or can be caused by the security groups rules that are set for your Talend Studio and cluster instances.

Results

The connection metadata is ready. But note that the cluster metadata must be updated each time when you start a new cluster, because then a different private IP/DNS will be attributed to the master node.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!