Showing posts with label Cloud. Show all posts
Showing posts with label Cloud. Show all posts

Monday, December 22, 2008

GGBLOG – Just In Time MapReduce with OSGi

A couple of years ago I was thrown into a team lead role where I was responsible for distributing workload across a number of developers. During the first few months I found that as the workload increased so did my issues whilst trying to increase the teams scale. It took me a few more months to figure out that my team leading technique needed to be turned upside down.

Instead of pushing the work out, I found it was much better for me to set up a work queues and have the developers pick up tasks from the queue. With this very minor change in technique I was able to double the team’s size without taking stress leave.

I’ve taken this concept and attempted to implement it on OSGi whilst borrowing heavily from the world famous MapReduce research made available by the ingeniously generous folk over at Google.

If you’d like to get my demo running or want to know what the most popular starting letter is on your favorite web pages, you can always follow the steps below. The steps should take around 5 – 10 minutes to complete.

Pre Requisites

1) Install eclipse 3.4 (http://www.eclipse.org/)

2) Install the Rich Client Platform (http://www.eclipse.org/rap/gettingstarted.php)

Running the Demo

Download the mapreduce.zip example source code from:

https://sourceforge.net/project/showfiles.php?group_id=228168&package_id=303652

Open 2 instances of eclipse and create 2 new workspaces, eg. Node1 and Node2.

In both instances configure RAP as the target platform.

Click window -> preferences -> Plug in development -> target platform -> Select the RAP target platform location which is located at [ECLIPSE_HOME]/ configuration/org.eclipse.rap.target-1.1.1/eclipse

Import the demo code into both instances of eclipse.

Click file -> Import -> Existing Projects into workspace -> Select archive file -> click browse and select the mapreduce.zip file which you downloaded earlier.

In both instances of eclipse, Expand galang.research.rap.hello -> double click plugin.xml then click launch RAP application.

In any eclipse instance click Add URL content to memory.

To add a few more pages, Paste the following URL’s and click Add URL content to memory after each one.

http://cnn.com

http://slashdot.com

http://engadget.com

http://smh.com.au


Once youve added the above urls, click run map reduce. If you see both instances of eclipse showing the map reduce output, You’ve setup the demo as expected.

What’s going on under the covers?

A wise/lazy man once said a picture says a thousand words so below is a sequence diagram of what is going on under the hood. You can also step through the code if diagrams aren’t your thing.

Can you use this algorithm for anything else?

I think there are more uses for this algorithm this than just counting the starting characters of words on web pages. You could use this to run a lot of sql queries in parallel then reduce the output. If you want to do more with this then I’d recommend playing around with MyMapper.java and MyReducer.java.

I’d love to throw this code on a large number of nodes to see how quick I can get it run with gigabytes of data. If you’re a big iron or grid/cloud vendor with a few hundred nodes to spare you are more than welcome to drop me a line :) (glenn.galang@gmail.com).

Saturday, October 25, 2008

GGBLOG - Trading Time

Just got back from from the Trading and Investing Expo here in Sydney and boy was that the best $15 that I've spent in a while.

There were some pretty heavy hitting speakers throughout the day below are a few that really caught my attention:

1) Larry Williams

Larry is a high profile American speaker who has
traded futures & commodities for over 40 years. He is
the creator of the “Darlings of the Dow” concept and
fund. He is the winner of the World Cup real time
Trading Championship with +11,000%

2) Joe DiNapoli

Joe’s exhaustive investigations into Displaced Moving
Averages, his creation of the proprietary Oscillator Predictor and MACD Predictor and in particular, his practical and unique method of applying Fibonacci
ratios to the price axis, makes him one of today’s most
sought after experts.

What does this have to do with technology?

In the financial markets industry a couple of milliseconds can mean the difference between winning or loosing millions of dollars. I really think cloud computing and grid technologies have a great use case in this domain.

My goal is to learn how to think and breath like the best traders in the world and then marry that up with my technology skills. I'm already finding in my early financial markets learning that I already possess some analytical skills which I can leverage.

Friday, October 24, 2008

GGBLOG - In The Clouds

Today I'd like to talk about EC2 (elastic compute cloud) which is Amazons implementation of a compute cloud which really just a bunch of computers linked together (a cluster).

Where things start to get interesting over other plain old compute clusters is the E (elastic) part.

Amazon gives you a nice set of tools to bring computing nodes online on demand which is why they call it elastic.

I remember back in my days as a web developer when the 911 attacks occurred, even though we had 12 servers in our cluster to do static content serving, there was such a spike in load that the poor operations guy had to rush down to our hardware vendor and pick up some new machines and plug them into the cluster. This whole process took hours, It would have been pretty nice to have EC2 back then to bring the nodes online through a nifty little firefox extension.


There is alot of energy around grid based frameworks like Grid Gain, Hadoop, Coherence and Gigaspaces. These frameworks come into play by allowing you to take advantage of EC2 by allowing you to scale your applications up and out and providing linear scalability.

What does this mean?

Most applications out there are built on tiered and layered architectures. As a example (see pic below) a j2ee app is typically has a web browser, webserver, app server, integration server then back end server. Thats an awful lot of services/servers to go through to get any work done.

























In my experience the biggest problem to this architecture that there is always a glass ceiling when it comes to the scalability of these solutions, eg. the database or sap runs out of capacity

Grid based frameworks attempt to turn this on its head by allowing you break your applications into a structure which fully supports parallel processing.