Amazon EMR - Update Cluster Connection Metadata

author
Frédérique Martin Sainte-Agathe
EnrichVersion
6.5
EnrichProdName
Talend Open Studio for Big Data
Talend Data Fabric
Talend Big Data
Talend Real-Time Big Data Platform
Talend Big Data Platform
task
Design and Development > Designing Jobs > Hadoop distributions > Amazon EMR
EnrichPlatform
Talend Studio

Amazon EMR - Updating Cluster Connection Metadata

This article shows how to update Amazon EMR cluster connection metadata in the Talend Studio.

This example uses these licensed products provided by Amazon:

  • Amazon EC2
  • Amazon EMR

    For more information about how to launch an Amazon EMR cluster from the Talend Studio, see Amazon EMR - Getting Started.

Updating cluster connection metadata

Before you begin

Each time you start a new cluster, it is necessary to update the cluster connection metadata in the Talend Studio Repository.

Once your new cluster is started using the Amazon EMR web interface, you should be able to find the new private IP and DNS of the cluster master node. The private IP and DNS will be used to update the hosts file.
  • On a Windows instance, navigate to C:\Windows\System32\drivers\etc\ and open the hosts file.
  • On a Linux instance, open the /etc/hosts file.
Then configure it as follows:

To update the Hadoop cluster metadata in the Talend Studio with the connection information of the new cluster, do the following:

Procedure

  1. In the Talend Studio Repository, double-click your Hadoop cluster connection metadata and click Next.
  2. In the Update Hadoop Cluster Connection - Step 2/2 window, update the private DNS values:

    Click Finish.

  3. In the popup message, click Yes to accept the propagation to all Jobs of the new configuration.
  4. Click OK to update all your Jobs.

    If you previously created a connection to HDFS, it will also be updated.