Amazon EMR - Getting Started
Environment:
The examples use a Talend Studio with Big Data. In addition, they use these licensed products provided by Amazon:
-
Amazon EC2
-
Amazon EMR
To perform the steps listed below, you must have an Amazon AWS account. If you don’t have an Amazon AWS account, please follow the instructions in the Creating an Amazon Web Services Account video.
Launch and Connect to an Amazon EC2 instance
Procedure
Recommended settings of an EC2 instance
It is recommended to configure your EC2 instance as follows.
For more explanations about the EC2 settings, see Launching an Instance in the Amazon documentation.
Procedure
Install and Start Talend Studio
Before you begin
Procedure
- Install Talend Studio referring to the section about how to install Talend Studio in the Talend Installation Guide.
- Once installed, start your Studio and create a new project, see Creating a project in the Talend Studio User Guide.
Launch an Amazon EMR cluster from the Talend Studio
Getting your Amazon Credentials
To access the Amazon Services, you will need your Amazon credentials (access key and secret access key).
If the security policy of your organization does not allow you to explicitly expose the credentials in a client application such as a Job, skip this section and use the inherit credentials from AWS role check box that will be explained later in this article.
Procedure
Define roles in Amazon EMR
Procedure
Start an Amazon EMR cluster
Procedure
Results
A new cluster is launched. You can verify it from the Amazon EMR home page:
You can also check the the status from the EC2 instances list:
In the Studio, the console in the Run view shows the following message:
Your cluster is now ready.
Update the hosts file
Once started, each EC2 instance is attributed a public and private IP, and a public and private DNS.
The cluster nodes are configured using the private DNS. Therefore, you will update the hosts file of the Talend Studio instance with the private DNS and private IP of your master node.