Grid Computing
Amazon EC2
Amazon EC2 heralds the arrival of computing-as-a -utility-service. Unlike previous efforts by Sun, IBM, Nortel, and others, Amazon has hit on the right offering in features, pricing, and provisioning. While there will no doubt be plenty of copy-cats, Amazon has such a head start that the community that builds up will give them a big advantage.I expect Amazon will become the eBay of utility computing service. That is somewhat surprising considering how Google is seen as the leader in networked computing technology (See also How Google Works), but they seriously lag in turning that into innovative products.
Other folks think EC2 is hot too:
http://www.cambrianhouse.com/idea-explorer/idea-promoter/ideas-id/VlD2zl1/
http://www.virtualization.info/2006/08/amazon-launches-xen-powered-virtual.html
http://marklogic.blogspot.com/2006/11/web-20-summit-jeff-bezos.html
http://www.zefhemel.com/archives/2007/03/04/webfs
http://www.iunknown.com/xml/atom/article/791/feed.xml
- Carr on Grid
- Reviews article in Grid Today which discusses business model issues in utility/elastic computing.
Applications
Applications I'd like to see available on-demand @ EC2:Mark Logic Content Server - Enterprise Edition
Mathematica Personal Grid Edition
- Nutch Appliance
- http://www.baynote.org/
- Mashery
- Devloper API provisioning and metering service.
- Article on API metering
- http://blog.programmableweb.com/2007/04/02/12-ways-to-limit-an-api/
Freeswitch VOIP appliance is EC2 compatible.
Apple XGrid
http://www.macresearch.org/openmacgridOne of my bright ideas is a Dashboard widget to dial up EC2 instances that are then accessible via XGrid.
http://mekentosj.com/widgets/xgrid/
Integration with OS X Server Xgrid controllers is a natural for EC2 also.
http://www.apple.com/server/macosx/features/xgrid.html
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/
Xgrid for non-Mac
Xgrid works for any platform.http://unu.novajo.ca/simple/archives/000026.html
Java Toolkits
Apache Hadoop
Spawned from Lucene Nutch (wiki).Globus Toolkit
http://www.globus.org/toolkit/- Terracotta
- Formerly proprietary clustering solution, now OSS with many features.
WEKA
http://www.cs.waikato.ac.nz/ml/weka/Weka-Parallel http://weka-parallel.sourceforge.net/ - parallel processing for Weka.
Grid Weka http://smi.ucd.ie/~rinat/weka/ - grid computing with Weka.
weka4WS http://grid.deis.unical.it/weka4ws - distributed data mining.
Essence
http://www.jtoolkit.org/ is a clustered/shared collections implementation for Java.A nice idea, it would probably be more interesting implemented using Globus Toolkit, or even JINI/JavaSpaces/Blitz rather than JCache.
Taverna
- Taverna
- A comprehensive bioinformatics workflow management tool.
- myGrid
- A turn-key grid service running Taverna.
- myExperiment
- Sharing myGrid workflows.
FireFish
- http://dsd.lbl.gov/~hoschek/firefish/
- "Firefish is a Peer-to-Peer Grid service infrastructure for access to dynamic data featuring ease-of-use, interoperability, scalability and performance."
SmartFrog
- http://www.smartfrog.org/
- is grid toolkit with load-sensitive provisioning support.
ICE
http://www.zeroc.com/icej.htmlManagment GUI
http://www.openmosixview.com/Distributed Shells
http://www.csm.ornl.gov/torc/C3/Man/cexec.shtmlhttp://www.globus.org/toolkit/docs/4.0/contributions/javacog/JavaCoG_Release_Notes.html
http://wiki.cogkit.org/index.php/CoG_Shell
http://forge.objectweb.org/projects/clif/
- Our Scripting Future
- http://www.ddj.com/dept/java/197002917
I don't agree with the anti-Java rant (nor obviously do the Lucene/Nutch/Hadoop] folks. Of course Java's start-up time is definitely an issue for scripts running on JIT-compiled Java engines, but the solution isn't to not use Java but to use precompilers for the engines. By the CTO of http://activegrid.com/.
Data Grid
Data mining operations (as opposed to compute-centered stuff like rendering and proofs) also have big data needs.The Alexa Web Search platform (which is also an Amazon service, as is The Internet Archive) has a nice crawl of the web and also offers grid computing services for it. But their pricing is 10x that of EC2.
EC2 Web/Data Co-op
In the event that Amazon doesn't make Alexa data available at EC2 prices, then a great solution is to implement a shared web caching proxy using S3 for EC2. It would work as a web search/data mining co-op that reduced everyone's Internet transfer costs by sharing the S3 costs for the caching proxy. The co-op would also share the storage costs for big data sets like Google's trillion trigrams, USPTO, US Census Bureau, GIS, etc.Freebase
Danny Hillis know a thing or two about grid computing.Elastic Compute Services
- Elastic Live
- Uses Enomalism for provisioning.
- Distributed Potential
- Uses Elastic Live and is priced at $0.50 per CPU/hour.
JBoss on EC2
http://www.redhat.com/about/news/prarchive/2008/jboss_amazon.htmlAkamai
"Akamai Gains Traction with Web Application Acceleration Service", eWeek. Minimum commitment is $10,000 per month.Akamai Edge service with IBM WebSpehere
http://www.alphaworks.ibm.com/tech/edgecomputing
http://www.akamai.com/html/technology/edgecomputing.html
IBM Deep Computing on Demand
http://www.physorg.com/news101658823.htmlGoogle Big Data transfer
Google is working with scientific users to schlep big data sets around, and they're interested in making that data available publicly.http://news.bbc.co.uk/2/hi/technology/6425975.stm
Resources
http://swik.net/clustering+Programminghttp://www.gridblog.com/index.php?id=C0_17_1
- Beowulf on EC2
- http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3.html
- Red Hat Cluster Suite
- http://www.redhat.com/docs/manuals/csgfs/
- Presentations from 2007 Xen Summit
- http://www.xensource.com/xen/xensummit.html
DNS
Using ZoneEdit with EC2Cool Apps (SunGrid)
- DistributedIndex
- Is an OSS Java document indexer using Spring.
Bloggers
http://ianfoster.typepad.com/blog/newtech/index.html- Grid Watch articles
- http://www-128.ibm.com/developerworks/grid/library/gr-watchcol.html