MapR - Getting Started
Prerequisites
- You have installed and configured MapR 5.1 (or later).
- You have installed Talend Studio.
- The dataset used (pearsonData.csv) in this article is called Pearson’s Height Data,
named for its creator Karl Pearson who, in the early 1900’s, founded the Mathematical
Statistics discipline.
You can download the Pearson dataset here. Feel free to use your own data, being mindful that aspects of this article will need to be adjusted.
Installing MapR Sandbox
The easiest way to install MapR is to download the fully configured MapR Sandbox. Delivered as a virtual machine, this fully configured environment provides a wealth of features that are part of the Hadoop ecosystem.
The VM is available in two formats: VMWare and VirtualBox.
You can find complete instructions on installing and setting up the sandbox here.
Performing the MapR Post-Installation Steps
Create the OS user
- At the VM console, press Alt+F2 and log in as root with the password mapr.
- Add the puccini user with the following command:
adduser -G mapr --home /user/puccini puccini
Create a hdfs user
- The VM console will identify the URL to manage your sandbox. To view the banner page, press Alt-F1 at the console.
- Open a browser, then navigate to the URL provided and launch Hue. The default sandbox username and password is mapr/mapr.
- Click the administration icon in the upper right corner, and click Manage
Users.
- Create a user with username puccini and password puccini. Leave the option Create home directory checked.
- Make sure that the user is part of the default group and click
Add user to finish.
- Click the Manage HDFS icon in the upper right corner to check that the puccini user is created, and then navigating to the /user directory. You should see a directory called puccini.
Working With HDFS
Prerequisites
For the purposes of this article, only Talend Studio is required.
If you do not have Talend already installed, follow either one of the following options available on the Talend Website.
Once installed, start your Talend Studio and create a new project, as described in the How to create a project documentation.
Create the cluster metadata - MapR 5.2
Before you begin
- You have opened Talend Studio.
- You have installed the MapR Client in order to connect to the MapR Sandbox
from Talend.
Follow the instructions for your client operating system here. When running the configure script, the default Sandbox cluster name is demo.mapr.com and the IP address can be found on the VM Console banner page (found by pressing Alt+F1 at the console).
For example, you can configure script execution on Windows:server\configure.bat -N demo.mapr.com -c -C 192.168.111.134:7222
Procedure
Create HDFS Metadata - MapR
Before you begin
- You have created a cluster metadata connection.
Procedure
Write Data to HDFS - MapR
Before you begin
- You have created a HDFS connection object leveraging the cluster repository connection you just created.