SCADS
SCADS: Scalable, Consistency-Adjustable Document Store.
A nonrelational, horizontally-scalable storage for structured data, focusing on exposing the tradeoff between consistency and availability/performance (i.e. replication) especially in the face of propagating updates in a complex data graph.
- Students: Michael Armbrust
- Project homepage
More Detail:
While relational algebra created a revolution in data management and a new industry around relational database management systems (RDBMS’s), there is little disagreement that today’s Web applications have different needs. The ACID (atomicity, consistency, isolation, durability) guarantees provided by RDBMS’s are stronger than needed for most Web applications, and fully-general relational queries are more expressive than needed by many Web applications; yet the engineering required to combine those properties in conventional RDBMS’s means that they scale less and cost more than special-purpose storage systems that sacrifice one or more of the properties. For example, Amazon’s Dynamo and Google’s BigTable sacrifice one or more of these properties in order to achieve far greater scale and higher throughput than any existing RDBMS, and even our ROC storage system prototypes relaxed ACID to facilitate crash-only design and SML-based automated monitoring.
While many “one-off” specialized storage systems have been built, each requires rewriting the application to use the storage system, which explains the longevity of SQL as an implementation-independent abstraction for describing operations on stored data. With SCADS (Scalable Consistency-Adjustable Document Store), we are working with Facebook, the Internet Movie Database, Amazon, and eBay to capture use cases for their large-scale distributed databases, with the goal of developing both a formalism comparable to SQL for reasoning about such applications’ storage needs and a prototype of a “SCADS engine” that can scale to 1,000 machines on Amazon EC2. We see an opportunity for a new abstraction with the advent of Ruby on Rails, whose Active Record middleware layer provides an object-graph model that fits the needs of many Web applications. Given the uptake of Ruby on Rails, a new abstraction that is near-compatible with Active Record would be much less disruptive than a completely new programming model. We believe a formalism is needed in which to ground this abstraction, both because it would facilitate the kinds of optimizations that today’s query optimizers perform on SQL queries (by applying relational algebra transformations) and because it could provide an implementation-independent specification for building future consistency-adjustable storage systems.