Monday, December 22, 2008

GGBLOG – Just In Time MapReduce with OSGi

A couple of years ago I was thrown into a team lead role where I was responsible for distributing workload across a number of developers. During the first few months I found that as the workload increased so did my issues whilst trying to increase the teams scale. It took me a few more months to figure out that my team leading technique needed to be turned upside down.

Instead of pushing the work out, I found it was much better for me to set up a work queues and have the developers pick up tasks from the queue. With this very minor change in technique I was able to double the team’s size without taking stress leave.

I’ve taken this concept and attempted to implement it on OSGi whilst borrowing heavily from the world famous MapReduce research made available by the ingeniously generous folk over at Google.

If you’d like to get my demo running or want to know what the most popular starting letter is on your favorite web pages, you can always follow the steps below. The steps should take around 5 – 10 minutes to complete.

Pre Requisites

1) Install eclipse 3.4 (http://www.eclipse.org/)

2) Install the Rich Client Platform (http://www.eclipse.org/rap/gettingstarted.php)

Running the Demo

Download the mapreduce.zip example source code from:

https://sourceforge.net/project/showfiles.php?group_id=228168&package_id=303652

Open 2 instances of eclipse and create 2 new workspaces, eg. Node1 and Node2.

In both instances configure RAP as the target platform.

Click window -> preferences -> Plug in development -> target platform -> Select the RAP target platform location which is located at [ECLIPSE_HOME]/ configuration/org.eclipse.rap.target-1.1.1/eclipse

Import the demo code into both instances of eclipse.

Click file -> Import -> Existing Projects into workspace -> Select archive file -> click browse and select the mapreduce.zip file which you downloaded earlier.

In both instances of eclipse, Expand galang.research.rap.hello -> double click plugin.xml then click launch RAP application.

In any eclipse instance click Add URL content to memory.

To add a few more pages, Paste the following URL’s and click Add URL content to memory after each one.

http://cnn.com

http://slashdot.com

http://engadget.com

http://smh.com.au


Once youve added the above urls, click run map reduce. If you see both instances of eclipse showing the map reduce output, You’ve setup the demo as expected.

What’s going on under the covers?

A wise/lazy man once said a picture says a thousand words so below is a sequence diagram of what is going on under the hood. You can also step through the code if diagrams aren’t your thing.

Can you use this algorithm for anything else?

I think there are more uses for this algorithm this than just counting the starting characters of words on web pages. You could use this to run a lot of sql queries in parallel then reduce the output. If you want to do more with this then I’d recommend playing around with MyMapper.java and MyReducer.java.

I’d love to throw this code on a large number of nodes to see how quick I can get it run with gigabytes of data. If you’re a big iron or grid/cloud vendor with a few hundred nodes to spare you are more than welcome to drop me a line :) (glenn.galang@gmail.com).

2 comments:

Steve Hanov said...

Hi, I'm just googling for everything to do with sequence diagrams and adding comments to promote my tool www.websequencediagrams.com.

I think it could have let you draw your diagram a lot more quickly.

lluis said...

The diagram shown here is created with Enterprise Architect, a very good UML tool. The language to express UML is already invented, it's called XMI and it's a standard. Happy new year!