Working with Hive on an Amazon EMR cluster
This example uses these licensed products provided by Amazon:
- Amazon EC2
- Amazon EMR
For more information about how to launch an Amazon EMR cluster from the Talend Studio, see Amazon EMR - Getting Started.
Create Hive connection metadata
Before you begin
We assume that you already have launched an Amazon EMR 4.0.0 cluster and that you configured the cluster metadata in the Talend Repository.
Procedure
Create a Hive table
Before you begin
We assume that a file named CustomersData has already been written to HDFS, and we will convert this to a Hive table.
In the following example we use the Hive table creation wizard.
Procedure
Running a Hive Table analysis
Before you begin
You can leverage your cluster computation capabilities to run analyses on your Hive table.
Procedure
Each analysis will be sent as a Hive QL request to your cluster and will run as a MapReduce job.
The results will be displayed in the Talend Studio as charts or tables.
For more information about other ways to work with tables, see the article Work with Amazon Relational Database Service (RDS).