Log and Code Repository

From RAD Lab

Jump to: navigation, search

Contents

Berkeley Web 2.0 operations log and code repository

RAD Lab is a new research project at the EECS department of UC Berkeley. We want to enable people to build Internet applications that are more reliable and scalable and easy to deploy and operate. To evaluate our ideas, it's extremely useful to have traces/logs from real-world Internet sites that we can use in our experiments.

If you run a web site and you want to help us build better Internet, read on and contact us at bodikp[at]cs[dot]berkeley[dot]edu!

How you can help

We want to create a realistic environment where we can run our experiments. You can help us by providing traffic and source code information about your site. We will ANONYMIZE the information (or help you do so before you send it to us) and we will not give it out to ANYONE without your permission.

logs
Web server access/error logs are very useful for understanding the traffic to your site. Do you have logs for other parts of the system? Database, J2EE middleware, ... Do you log your calls to Flickr or Amazon API?
source code
With the source code we can run your application on our servers and perform more extensive experiments. We can essentially host your site and replay the traffic you saw in a "sandboxed" environment. At some point we hope to be able to actually offer free hosting.
your experience running the site
Did you have any failures or bugs in your code? Were you Slashdotted or Dugg? Do you have interesting anecdotes about operating your site? How did you track down and fix/recover from any tricky problems? We are interested in all this!

Are you interested? Contact us at bodikp[at]cs[dot]berkeley[dot]edu.

Anonymizing the data

We can help you anonymize your data. Some data we already have has been sanitized as follows:

  • IP addresses of the clients were replaced with their hashes
  • some parts of the URLs were also replaced with hashes (for example, parts of URL that contain login names of the users or other potentially identifying information, such as domains)
  • subsample the logs (filter out some fraction of the data) so we have just a representative sample rather than comprehensive traces

We'll work with you to figure out how to sanitize your logs to the point where you're comfortable having them used for research purposes.

Need more control over your logs?

Are you a larger company that requires more control over their logs? If required, members of RAD Lab can sign a Non-disclosure agreement with your company governing our use of the log data.

How we will use your data

We will use your logs and applications to recreate a realistic Internet environment where we will experiment with new ideas and algorithm. Here are two projects where we developed new techniques for detecting and localizing failures in Internet services:

  • combining automatic detection of failures with visualization pdf
  • detecting and localizing failures in J2EE systems pdf

What you get in return

All results and tools will be freely available under the BSD license.

Furthermore we hope to be able to offer free hosting facilities managed using the techniques developed in the research project. We would try to give preference to those sites whose log data helped us test the techniques to begin with.

Who's contributed so far?

Anonymous 1
Anonymous 2
Ebates.com link
access logs, description of failures
Acme.com link
access logs
Groupr link
access logs
Matchr link
access/error logs, source code
Spell with Flickr link
access/error logs, source code
krazydad.com link, coverpop.com link
access logs
Fastr link
access logs
Personal tools