Note! The information here is outdated and the pages will be removed. Please refer to our new Grid documentation page at: http://doc.grid.surfsara.nl/

Hadoop/MapReduce Development Environment

From SURFsara Grid pages
Jump to: navigation, search

You can develop MapReduce applications in the same manner as you develop other Java applications. Just put the Hadoop libraries on your classpath and you're ready to go. You can use your favorite editor or IDE to develop, test and run your application locally.

This page contains an overview of software we run in our development environment for developing MapReduce applications. To get your own development environment, you have two choices:

  • Either download a complete environment in one of our virtual disk images
  • Set up your own environment with (a subset of) the software listed below


Support

If you need help with setting up your development environment, please do not hesitate to contact Evert Lammerts.

Virtual Disk Images

Using virtual machines as a development environment is experimental. Although our experiences have been very good so far, it might not work for others due to limitations of soft- or hardware. If the latter is the case for you, then create a development environment without the virtual disk images we offer.

Our development environment can be downloaded as a VMWare disk and as a VirtualBox disk. People running a 64 bit Operating System should be able to use these, providing it is supported by the hardware. Whichever image you use, make sure to read the notifications you will get when running your virtual machine.

Image Details

The image runs a 64 bit Ubuntu Linux operating system. You need to fill this in this when creating a new VM to which you will attach the disk image.

You will automatically login to the image, without needing a password. The username is "hackathon", the password is "sara".

VMWare

To use the VMWare disk on Windows or Linux, you need to have the VMWare Player installed. To run it on Mac (Intel architectures only!) you need to have VMWare Fusion installed.

Download the VMWare disk image by clicking here.

After your download has finished and you have unzipped the (tar.bzip2) file, you can use VMWare [Player|Fusion] to create a new virtual machine, and attach the disk to that image. Note that you might have to change the virtual machine's resolution to fit your screen better.

VirtualBox

To use the VirtualBox disk image you need to have VirtualBox installed.

Download the VirtualBox disk image by clicking here.

After your download has finished and you have unzipped the file, you can use VirtualBox to create a new virtual machine, and attach the disk to that image.

Software in the SARA MapReduce Development Environment

This is the software we have installed in the disk images provided above:

This documentation does not cover the installation of the Oracle Java JDK 1.6, that of a text editor or IDE nor that of a subversion client.

Downloading Hadoop

The SARA Hadoop prototype cluster has Cloudera's distribution for Hadoop, version 3 beta 3 (CDH3b3) installed. This distribution contains Hadoop version 0.20.2. Although you should be able to use any Hadoop 0.20.2 distribution when developing applications for the prototype cluster, we recommend using Cloudera's distribution: CDH3b3 includes a number of patches and backports from newer versions of Hadoop.

Linux users can install CDH3b3 through their package manager. If your package manager of choice is Aptitude (apt-*), follow steps 1-3 of the Debian section of this page. If you use Yum then refer to steps 1-2 of the RedHat section of that same page. The advantage of this is that you can easily install and play with other Hadoop related software packages provided by Cloudera, such as HBase and Pig.

Windows and Mac users can download the release from Cloudera as a gzipped tarbal, and extract it in a location of choice.

Configuring the classpath

Whether you use an IDE or a text editor, when compiling and running your Hadoop code you must have its libraries on the classpath. To be on the safe side, you need to include:

  • All jars in the Hadoop installation directory (for CDH3b3 that is /usr/lib/hadoop)
  • All jars in the subdirectory lib/ of the Hadoop installation directory, including all jars in subdirectories below that

Testing your development environment

You can test your development environment by checking out an empty MapReduce project from our Subversion server and running it in your environment.

Hadoop documentation and resources