Projects/SCADS

From RAD Lab

Jump to: navigation, search

Contents

SCADR Link

scadr.radlab.net

Spring 2009

Requirements

Fall 2008

APIs

1) Priority Queue

Supporting Class(es)

SLAInfo {

// up to michael

}


QueueObject {

String updateId // for finish call
String indexId
String colFamCol  // in form ColumnFamily:Column
String data
SLAInfo slaInfo

}

API:

  • void enqueue(QueueObject obj) // put obj on the queue (Thread Safe)
  • QueueObject dequeue() // take an object off the queue (Thread Safe)
  • void finish(String updateId) // inform the queue that I'm finished with update


2) Index Format

Indexes are a long list of values, each one a column with one row, which is Table|Key|colFam:col for the item the index is indexing. Indexes will be either their own version of cassandra or their own table, but in either case will be identified internally by a string key.

more

Design Goals

  1. Data is r/w (user believes data is mutable)
  2. Scale up and down in terms of data size and request rate (1000s of commodity servers)
    1. You can't buy your way out of the problem.
  3. Implementation cost per user or per unit of data doesn't grow as systems scales
  4. Interactive response time doesn't grow with number of users or dataset size
  5. High Availability
  6. Durability
  7. Data model/transactional semantics?
  8. Support queries over hierarchical associations.
  9. Online scale up and down (datasize and throughput)
  10. "Interactive" resp
  11. (Query Rate) / (#Machines) = K
    1. For any dataset size
  12. As cheap as mysql on same hardware
  13. Highly Available 99.999%
  14. E2E Durability = 99.999%

Implications

  • Query result must small and doesn't scale with system size
  • Data touched by a query doesn't scale with dataset size
  • No Single Copy Consistency
  • (#refs)*(#ref avg latency) < SLA
  • Query time < O(datasize)
  • O(1) OK... O(lg n) ??
  • redundancy to minimize the likelihood of correlated loss/failure
  • 5 -> Shared Nothing
  • 1+3+4 -> No Single Copy Consistency
  • 3 -> (#refs)*(#ref avg latency) < SLA
  • 3+4 -> Query time < O(datasize)
    • O(1) OK... O(lg n) ??
  • 6+7 -> redundancy to minimize the likelihood of correlated loss/failure

Implementation

  • B => Adjustable Consistency
  • C => DRAM + Flash to absorb > 90% accesses
  • D => No Scans
  • B + D => What Queries can be exressed
    • => Abstraction for time


Meeting Notes

2008-09-16

Useful Links

Using Cassandra

EC2 Bootstrap Guide

Deploying with Chef