MapR Connecting a MapR distribution to the Talend Studio using cluster metadata
To run Big Data Jobs, the Talend Studio must be connected to a running Hadoop cluster. You can either configure the connection information in each individual component or store the configuration in metadata in the Repository and reuse it in components as needed.
We’ll take the second approach, which is the most efficient way to configure connection information.
You’re now ready to create cluster metadata.Environment
This article was written and validated using Talend Studio 6.1 to connect to a MapR 5.0 cluster.Solution
Configuring the Studio to use the MapR client:
MapR Hadoop cluster metadata is created manually. This requires having information about your cluster, such as the Namenode URI, Resource Manager address, or Job Tracker URI, depending on whether you’re using YARN or MapReduce v1. You may also need other information, such as the Job history or Resource Manager scheduler location.
- In Studio > Repository > Metadata , right-click Hadoop Cluster , then click Create Hadoop Cluster :
2. In the Name box, enter MapRCluster and click Next . The Hadoop Configuration Import Wizard opens:
3. In the Distribution list, select MapR , and in the Version list, select MapR 5.0.0(YARN mode).
4. Select Enter manually Hadoop services and click Finish .
The Hadoop Cluster Connection window opens:
5. Confirm that the distribution information is correct.
A few values, such as the Namenode URI and Resource Manager address, are preconfigured.
Change the localhost value to the IP address or DNS name of your cluster. If the cluster was configured with the default port values, then 7222 and 8032 are the host ports for the Namenode and Resource Manager, respectively.
6. Configure the connection as follows:
Namenode URI: maprfs:///
Resource Manager: <ClusterName>:8032
Resource Manager Scheduler: <ClusterName>:8030
Job History: <ClusterName>:10020
Staging directory: /var/mapr/cluster/yarn/rm/staging
User name: <UserName>
Group name: <GroupName>
7. Check your configuration:
8. Click Check Services to verify the connection to the cluster:
If the progress bars go up to 100% with no error message, you’re connected.
9. Click Finish . Your cluster metadata appears in Repository > Metadata > Hadoop Cluster .
Now that you are successfully connected, you can reuse the metadata in Jobs and components to process your data using the Spark or MapReduce framework.
MapR: Tips for starting with a MapR 5.0.0 sandbox