Grid Computing

Amazon EC2

Amazon EC2 heralds the arrival of computing-as-a -utility-service. Unlike previous efforts by Sun, IBM, Nortel, and others, Amazon has hit on the right offering in features, pricing, and provisioning. While there will no doubt be plenty of copy-cats, Amazon has such a head start that the community that builds up will give them a big advantage.

I expect Amazon will become the eBay of utility computing service. That is somewhat surprising considering how Google is seen as the leader in networked computing technology (See also How Google Works), but they seriously lag in turning that into innovative products.

Other folks think EC2 is hot too:

http://www.cambrianhouse.com/idea-explorer/idea-promoter/ideas-id/VlD2zl1/

http://www.virtualization.info/2006/08/amazon-launches-xen-powered-virtual.html

http://marklogic.blogspot.com/2006/11/web-20-summit-jeff-bezos.html

http://aws.typepad.com/aws/

http://www.zefhemel.com/archives/2007/03/04/webfs

http://www.iunknown.com/xml/atom/article/791/feed.xml

Carr on Grid
Reviews article in Grid Today which discusses business model issues in utility/elastic computing.

Applications

Applications I'd like to see available on-demand @ EC2:

Mark Logic Content Server - Enterprise Edition

Mathematica Personal Grid Edition

Nutch Appliance
http://www.baynote.org/
Mashery
Devloper API provisioning and metering service.
Article on API metering
http://blog.programmableweb.com/2007/04/02/12-ways-to-limit-an-api/

Freeswitch VOIP appliance is EC2 compatible.

Apple XGrid

http://www.macresearch.org/openmacgrid

One of my bright ideas is a Dashboard widget to dial up EC2 instances that are then accessible via XGrid.

http://mekentosj.com/widgets/xgrid/

Integration with OS X Server Xgrid controllers is a natural for EC2 also.

http://www.apple.com/server/macosx/features/xgrid.html

http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Xgrid for non-Mac

Xgrid works for any platform.

http://unu.novajo.ca/simple/archives/000026.html

Java Toolkits

Apache Hadoop

Spawned from Lucene Nutch (wiki).

Globus Toolkit

http://www.globus.org/toolkit/

http://workspace.globus.org/

Terracotta
Formerly proprietary clustering solution, now OSS with many features.

WEKA

http://www.cs.waikato.ac.nz/ml/weka/

Weka-Parallel http://weka-parallel.sourceforge.net/ - parallel processing for Weka.

Grid Weka http://smi.ucd.ie/~rinat/weka/ - grid computing with Weka.

weka4WS http://grid.deis.unical.it/weka4ws - distributed data mining.

Essence

http://www.jtoolkit.org/ is a clustered/shared collections implementation for Java.

A nice idea, it would probably be more interesting implemented using Globus Toolkit, or even JINI/JavaSpaces/Blitz rather than JCache.

Taverna

Taverna
A comprehensive bioinformatics workflow management tool.
myGrid
A turn-key grid service running Taverna.
myExperiment
Sharing myGrid workflows.

FireFish

http://dsd.lbl.gov/~hoschek/firefish/
"Firefish is a Peer-to-Peer Grid service infrastructure for access to dynamic data featuring ease-of-use, interoperability, scalability and performance."

SmartFrog

http://www.smartfrog.org/
is grid toolkit with load-sensitive provisioning support.

ICE

http://www.zeroc.com/icej.html

Managment GUI

http://www.openmosixview.com/

Distributed Shells

http://www.csm.ornl.gov/torc/C3/Man/cexec.shtml

http://www.globus.org/toolkit/docs/4.0/contributions/javacog/JavaCoG_Release_Notes.html

http://wiki.cogkit.org/index.php/CoG_Shell

http://forge.objectweb.org/projects/clif/

Our Scripting Future
http://www.ddj.com/dept/java/197002917

I don't agree with the anti-Java rant (nor obviously do the Lucene/Nutch/Hadoop] folks. Of course Java's start-up time is definitely an issue for scripts running on JIT-compiled Java engines, but the solution isn't to not use Java but to use precompilers for the engines. By the CTO of http://activegrid.com/.

Data Grid

Data mining operations (as opposed to compute-centered stuff like rendering and proofs) also have big data needs.

The Alexa Web Search platform (which is also an Amazon service, as is The Internet Archive) has a nice crawl of the web and also offers grid computing services for it. But their pricing is 10x that of EC2.

EC2 Web/Data Co-op

In the event that Amazon doesn't make Alexa data available at EC2 prices, then a great solution is to implement a shared web caching proxy using S3 for EC2. It would work as a web search/data mining co-op that reduced everyone's Internet transfer costs by sharing the S3 costs for the caching proxy. The co-op would also share the storage costs for big data sets like Google's trillion trigrams, USPTO, US Census Bureau, GIS, etc.

Freebase

Danny Hillis know a thing or two about grid computing.

http://www.freebase.com/.

Elastic Compute Services

Elastic Live
Uses Enomalism for provisioning.
Distributed Potential
Uses Elastic Live and is priced at $0.50 per CPU/hour.

JBoss on EC2

http://www.redhat.com/about/news/prarchive/2008/jboss_amazon.html

Akamai

"Akamai Gains Traction with Web Application Acceleration Service", eWeek. Minimum commitment is $10,000 per month.

Akamai Edge service with IBM WebSpehere

http://www.alphaworks.ibm.com/tech/edgecomputing

http://www.akamai.com/html/technology/edgecomputing.html

IBM Deep Computing on Demand

http://www.physorg.com/news101658823.html

Google Big Data transfer

Google is working with scientific users to schlep big data sets around, and they're interested in making that data available publicly.

http://news.bbc.co.uk/2/hi/technology/6425975.stm

Resources

http://swik.net/clustering+Programming

http://www.gridblog.com/index.php?id=C0_17_1

http://www.gridpp.ac.uk/gas/

Beowulf on EC2
http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3.html
Red Hat Cluster Suite
http://www.redhat.com/docs/manuals/csgfs/
Presentations from 2007 Xen Summit
http://www.xensource.com/xen/xensummit.html

DNS

Using ZoneEdit with EC2

Cool Apps (SunGrid)

DistributedIndex
Is an OSS Java document indexer using Spring.

Bloggers

http://ianfoster.typepad.com/blog/newtech/index.html
Grid Watch articles
http://www-128.ibm.com/developerworks/grid/library/gr-watchcol.html