Cs294-spring08
From RAD Lab
Contents |
[edit]
Administrative Info
- David Patterson (patterson@cs), Armando Fox (fox@cs), Will Sobel (wsobel@eecs)
- Mailing list: saas@lists.eecs.berkeley.edu
- Course number: CS294-23
- CCN: 27399
- Where/when: MW 2:30-4, 405 Soda
- One-time special meeting: Friday 1/25, 2:30, 405 Soda - Rob O'Callahan visiting from Mozilla
- Units: 2 units without project and scribing some lectures, 3 units with project
[edit]
Projects (and Weds 3/12 appointment times, in 413 Soda)
- 2:30 Michael Armbrust, Gunho Lee - Characterizing the variance of EC2 machine performance, network topology, etc. and developing recommendations for deployment
- 2:45 Matei Zaharia and Andyk - try to identify a CS262 project that could be joint with this class. Possibility: using the modified/improved Hadoop and the Hadoop trace data as a case study of Michael's findings; how should Hadoop users deploy on EC2. Correlate results for Hadoop to Michael's findings specifically.
- 3:00 Raluca Sauciuc - Concolic testing for Javascript code - improved code coverage compared to current testing techniques. Suggestion: look at widely-used libraries such as Script.aculo.us, Prototype, Google Gears, Adobe Air.
- 3:15 Ryan Waliani - TBD contribute to one of the existing RAD Lab projects in machine learning analysis of data - to discuss on Weds.
- 3:30 Sonesh Surana
- 3:45 Jingtao Wang
[edit]
Web 2.0 "service building blocks"
- Building blocks of Web 2.0 server side
- Overview of:
- Ad serving
- checkout/financial txns
- geocoding
- messaging/IM
- storage: S3, Dynamo and/or Bigtable paper/speaker
- data-parallel batch processing (Hadoop/MapReduce)
- collab filtering: Jordan to suggest overview paper
- social networking - Facebook
- search: talk from Lucene
- content distribution: Coral, Akamai
- Networking in Web 2.0 (Thacker from MSR)
- "Classic" vs "utility computing" SaaS
- Classic: Salesforce, Google Docs; can subdivide into whether requires interactive client latency or not
- Utility: EC2, Planetlab, Emulab; Virtualization (per-node VMs as well as large scale utility computing)
- Client side issues
- Client architecture: binary (Google Earth) vs "Web 1.0" client vs "Web 2.0" rich client (Google Maps/Docs, ?? Dave has reference from HPTS comparing these)
- Web 2.0 Application case studies
- Craig Harper - Apisphere - mobile IM with geocoding
[edit]
Project ideas
In general, each project can have a Phase 1 where it's deployed/benchmarked/measured on our small-scale local cluster(s) and a Phase 2 using large scale EC2 or similar.
- Is there really no 80/20 rule for how Web pages stress browsers?
- Is it really true that "most" Javascript is not executed more than once (hence no gain to JIT)? What about using a separate core (on multicore machine) to JIT, making it "free"?
- How "compliant" (CSS, XHTML, JS extensions, etc.) are different Web sites? Could we provide a browser plugin that figures this out in the background and reports back to a UCB-hosted database, and peer pressure on the publishers of those sites would result in site improvements?
- Repeatability/consistency of experiments on EC2: what are distributions of actual allocated CPU, internal latency, etc (eg: EC2 vs. VM on RAD Lab-owned hardware vs bare metal)? How could these findigns influence the design of VM monitors?
- Paper design of declarative datacenter markup
- RoR scalability/how fast respond to load spikes
- SCADS scaling
- Performance modeling (Peter/Charles)
- Using AjaxScope to measure client Javascript performance (eg, implement "random sampling of instrumenation" idea used in Liblit's work for Javascript apps)
- Design considerations for new Web 2.0 service (with or without prototype)
- Propose a way to model a domain and propose a standard for representing some new Web 2.0 service element (eg, analogous to OpenSocial) - eg, content distribution for streaming media
- Social network service with complex ACL to allow only certain third parties to gain access to the data. Also capable of storing encrypted personal data.
- application simulator (aka DummyApp) [Peter/George]
- each server runs multiple worker threads, each of which can execute the following operations: disk, memory, cpu, network, sleep, call, fork/call
- these servers can be grouped into web application tiers (web servers, app server, storage, ...) and the workload generator generates requests that specify which operations should be executed at each tier
Project ideas would be critiqued by one or two invited domain experts.
[edit]
Papers not yet slotted on any particular date
- GFS
- Dynamo - Amazon's Highly Available Key-Value Store (Peter Vosshall et al., Amazon.com, SOSP 07)
- An Evaluation of Amazon's Grid Computing Services: EC2, S3 and SQS
- Experience with some Principles for Building an Internet-Scale Reliable System (Mike Afergan et al., Akamai, WORLDS'05 workshop)
- MSR Asia CHT paper?
[edit]
Approximate syllabus by week
Links to guest speaker talk titles and abstracts
Links to discussion and scribe notes
- 1/23 Intro Slides coming soon
- 1/25 (special meeting) Rob O'Callahan
- 1/28 Read and discuss publications:AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applications (Emre Kiciman et al., Microsoft Research, SOSP 2007)
- Scribe: David Poll
- Scribe Notes: Link
- 1/30 Chuck Thacker, Microsoft Inc.
- 2/4 Thorsten von Eicken, Right Scale Inc.
- Time/Place: 2:00PM in room 465 (RAD Lab), 4th Floor of Soda Hall
- Title: The Future of Software: In the Cloud
- Scribe: Bryce Lee
- Scribe Notes: Link
- 2/6 Class discussion of last 4 speakers (leading to possible projects)
- Scribe: Bryce Lee
- Scribe Notes: Link
- 2/11 Charles Gordon, Amazon Inc.
- Time/Place: 2:30pm in 465 (RAD Lab)
- Title: IMDb's Architecture and Plans to Improve it
- Scribe: Simon Tan
- Scribe Notes: Link
- 2/13 Chris Olston, Yahoo Inc.
- Time/Place: 2:00PM in the Wozniak Lounge, 4th Floor of Soda Hall
- Title: "Processing Web-Scale Data with Pig"
- Scribe: Simon Tan
- Scribe Notes: Link
- 2/18 NO CLASS - President's Day
- 2/20 Andrew Fikes - Google Inc.
- Time/Place: 2:00PM in the Wozniak Lounge, 4th Floor of Soda Hall
- Title: Google's Scalable Architecture: GFS, Bigtable, and MapReduce
- To read: BigTable: A Distributed Storage System for Structured Data (Fay Chang et al., Google, OSDI '06)
- Slides: Image:Andrew-Fikes Google Big-table slides.pdf
- Scriber: Andyk
- Scribe Notes: Link
- 2/25 Bob Felderman, Google Inc.
- Time/Place: 2:00PM, room 465 (RAD Lab)
- Title: Datacenter Networking: Feeds, Speeds and Needs
- Scriber: Andyk
- Scribe Notes: Link
- 2/27 Class discussion of last 4 speakers (leading to projects)
- Scribe: Jeff Tang
- Scribe Notes: Link
- 3/3 Andrew Gordon, Microsoft Research
- Time/Place: 2:00PM, room 465 (RAD Lab)
- Title: Service Combinators for Farming Virtual Machines
- Scribe: Jeff Tang
- Scribe Notes: Link
- 3/5 Ari Steinberg, Facebook
- Time/Place: 2:00PM, In the Wozniak Lounge (4th Floor of Soda Hall)
- Title: Facebook Architecture and Abstractions
- Scribe: Kurtis Heimerl
- Scribe Notes: Link
- 3/10 Parallel In Class Project Discussions
- Scribe: Sameer Iyengar
- Scribe Notes: Link
- 3/12 Individual Meeting
- 3/17 NO CLASS due to CS Visit Day
- 3/19 Project Checkpoint / Midcourse correction
- 3/24 NO CLASS - Spring Break
- 3/26 NO CLASS - Spring Break
- 3/31 <TBD>
- 4/2 Parallel In Class Project Discussions
- 4/7 <TBD>
- 4/9 Project Checkpoint / Midcourse correction (Cancelled)
- 4/14 Project Checkpoint / Midcourse correction
- Scriber: Jingtao Wang
- Scribe Notes: Link
- 4/16 <TBD>
- 4/21 <TBD>
- Scriber: Kuang Chen
- Scribe Notes: Link
- 4/23 Early deadline for potential OSDI submissions (5/8 is OSDI drop dead date)
- 4/28 No Class
- 4/30 Poster Session - Link
- 5/5 Term paper deadline Midnight (unless OSDI, then 5/8)
- 5/7 (Last day of classes) What did we learn? Reflections on SaaS
