Note! The information here is outdated and the pages will be removed. Please refer to our new Grid documentation page at: http://doc.grid.surfsara.nl/
Hadoop/MapReduce Development Environment
You can develop MapReduce applications in the same manner as you develop other Java applications. Just put the Hadoop libraries on your classpath and you're ready to go. You can use your favorite editor or IDE to develop, test and run your application locally.
This page contains an overview of software we run in our development environment for developing MapReduce applications. To get your own development environment, you have two choices:
- Either download a complete environment in one of our virtual disk images
- Set up your own environment with (a subset of) the software listed below
If you need help with setting up your development environment, please do not hesitate to contact Evert Lammerts.
Virtual Disk Images
Using virtual machines as a development environment is experimental. Although our experiences have been very good so far, it might not work for others due to limitations of soft- or hardware. If the latter is the case for you, then create a development environment without the virtual disk images we offer.
Our development environment can be downloaded as a VMWare disk and as a VirtualBox disk. People running a 64 bit Operating System should be able to use these, providing it is supported by the hardware. Whichever image you use, make sure to read the notifications you will get when running your virtual machine.
The image runs a 64 bit Ubuntu Linux operating system. You need to fill this in this when creating a new VM to which you will attach the disk image.
You will automatically login to the image, without needing a password. The username is "hackathon", the password is "sara".
After your download has finished and you have unzipped the (tar.bzip2) file, you can use VMWare [Player|Fusion] to create a new virtual machine, and attach the disk to that image. Note that you might have to change the virtual machine's resolution to fit your screen better.
To use the VirtualBox disk image you need to have VirtualBox installed.
After your download has finished and you have unzipped the file, you can use VirtualBox to create a new virtual machine, and attach the disk to that image.
Software in the SARA MapReduce Development Environment
This is the software we have installed in the disk images provided above:
- Oracle JDK 1.6
- A text editor or IDE:
- We use Eclipse Helios
- The Hadoop 0.20.2 libraries:
- Optional: A Subversion client (supporting SVN servers version 1.5):
- Optional: An Empty MapReduce project (see below)
This documentation does not cover the installation of the Oracle Java JDK 1.6, that of a text editor or IDE nor that of a subversion client.
The SARA Hadoop prototype cluster has Cloudera's distribution for Hadoop, version 3 beta 3 (CDH3b3) installed. This distribution contains Hadoop version 0.20.2. Although you should be able to use any Hadoop 0.20.2 distribution when developing applications for the prototype cluster, we recommend using Cloudera's distribution: CDH3b3 includes a number of patches and backports from newer versions of Hadoop.
Linux users can install CDH3b3 through their package manager. If your package manager of choice is Aptitude (apt-*), follow steps 1-3 of the Debian section of this page. If you use Yum then refer to steps 1-2 of the RedHat section of that same page. The advantage of this is that you can easily install and play with other Hadoop related software packages provided by Cloudera, such as HBase and Pig.
Windows and Mac users can download the release from Cloudera as a gzipped tarbal, and extract it in a location of choice.
Configuring the classpath
Whether you use an IDE or a text editor, when compiling and running your Hadoop code you must have its libraries on the classpath. To be on the safe side, you need to include:
- All jars in the Hadoop installation directory (for CDH3b3 that is /usr/lib/hadoop)
- All jars in the subdirectory lib/ of the Hadoop installation directory, including all jars in subdirectories below that
Testing your development environment
You can test your development environment by checking out an empty MapReduce project from our Subversion server and running it in your environment.
Hadoop documentation and resources
- The Hadoop API docs at . Use this link in your IDE to couple Javadocs to the hadoop-core.jar library.
- The Apache Hadoop homepage
- search-hadoop.com, a useful search engine combining lots of on-line Hadoop related resources.
- The Hadoop mailing lists: mapreduce-user, hdfs-user, common-user, hadoop-general and cdh-user
- The SARA SVN server containing some sample resources (NOTE: this is a work in progress)