RADSClassFall06/labs/lab3

From RAD Lab

Jump to: navigation, search

Contents

Lab 3: saturate N machines

In this lab you'll get 6 (virtual) machines on which to deploy your app. As in lab 1, you have to find the saturation point of this configuration.

You're welcome to keep the same teams, but not obligated to do so.

Some things to keep in mind:

  • You can use your new-and-improved configuration and app code from lab 2 as a starting point.
  • Plan to use 1 physical server for load generation and the remaining physical servers to run VM's corresponding to the pieces of the app. You can run 2 load-generating VMs on the physical server.
    • To make your life easier, you can use the multi_server script in /work/jyzhang/.
      • This script lets you start and stop webservers, dispatchers, and databases from a single machine.
      • Best idea is to run this from your load generator VM.
      • You need to install net-ssh package for ruby to use it (do sudo gem install net-ssh).
      • Edit the script to point to your servers.
  • 2 teams will have to share each set of physical servers via the reservation system (link on cheat sheet), so don't leave this til the last minute.
  • In your final configuration, make sure that the load generator is not the bottleneck, ie that you have maxed out load generation but are still not keeping the app servers busy.
  • Initially, consider starting with a single lighttpd that talks to multiple remote FCGI dispatchers. Scripts will be provided on this page to help with starting remote dispatchers.
  • Now that there are more degrees of freedom -- how many webservers, how many dispatchers, how many databases -- you may have to do a couple of iterations if you run into bottlenecks early. A suggested starting configuration is 1 webserver, 1 database, and N dispatchers.
  • Although you can change the source code, the low hanging fruit was picked in lab 2. We suggest you explore the space of configurations instead (caching, pipeline balancing of webservers vs dispatchers vs DB, etc.)
  • The "external search" feature introduces variability in the response time that is dependent on the behavior of external servers. We suggest you disable this feature at first, get some consistent results, and then re-enable the feature to see how they affect your results.
    • To disable external search feature:
      • Go to the search controller
      • Find the method: ajax_externalsearch
      • Replace line: @results = search_ext_db(params[:title]) with: @results = []

Some things to put in your writeup

  1. Were you able to observe linear scaling? ie, with N machines, can you handle N times the workload of lab 2 with the same response time? (If you can handle more, there was a bottleneck in lab 2 that you missed. If you can handle less, there is a horizontal-scaling bottleneck that is holding you back.)
  2. What bottlenecks did you observe that were not present in labs 1 and 2?
  3. If you had time to re-enable external search, what qualitative effects did it have on your results relative to having it disabled?

Configuration instructions

These are being filled in as we go. You'll need to do some configuration to get the app to run in a distributed environment:

  1. A larger database (~1GB) will be provided, so that the whole benchmark doesn't fit in memory
  2. Starting dispatchers on multiple machines
  3. Correctly configuring memcached for a multiple-machine setup
  4. (Optional) multiple MySQL instances (master-slave configuration)

Basic setup

To set up researchindex to run on 5 VMs (one for lighttpd, three for dispatchers, and one for MySQL), follow these steps:

  1. create new copies of the labVM (follow the steps for cloning VMs in the cheat sheet) and put each on different physical server (replace VM names with your VMs):
    • vm1.vm: will run lighttpd
    • vm{2|3|4}.vm: will run dispatchers
    • vm5.vm: will run MySQL and memcached
    • vm6.vm: will run the load generator
  2. vm5.vm: change the MySQL privileges:
    • ssh -Y vm5.vm from one of the X machines, and run mysql-administrator&. Select "User administration", select "root", and then "@%" (this are settings for the root user when connecting from a remote host). Select "Schema privileges", select the RI_fake_med schema, select all "available privileges" (ctrl+A), and press the left arrow button. Repeat for all the other researchindex databases and hit "Apply changes".
  3. vm{2|3|4}.vm
    • find the following line in config/environment.rb: memcache_servers = [ 'localhost:11211' ] and replace it with memcache_servers = [ 'vm5.vm:11211' ]. This will point Rails to the memcached running on vm5.
    • point Rails to MySQL on vm5.vm: in config/database.yml, replace host: 127.0.0.1 in the production section with host: vm5.vm
    • start the dispatchers: run ./spawner -p 9000 -i 5 -- this will start 5 dispatchers listening on ports 9000, 9001, 9002, 9003, and 9004.
  4. vm1.vm: point lighttpd to the dispatchers running on vm{2|3|4}.vm:
    • in /etc/lighttpd/lighttpd.conf, comment out the following section:

fastcgi.server = (
  ".fcgi" => (
       "localhost" => (
          "min-procs" => 1,
        "max-procs" => 10 ,
          "socket" => "/tmp/fcgi.socket",
          "bin-path" => "/www/dispatch.fcgi"
       )
  )
)

and add the following one (replace the IP addresses with the IP addresses corresponding to your dispatcher VMs):

fastcgi.server = ( ".fcgi" =>
  ( "ri2-0" => ( "host" => "192.168.7.102", "port" => 9000 ),
    "ri2-1" => ( "host" => "192.168.7.102", "port" => 9001 ),
    "ri2-2" => ( "host" => "192.168.7.102", "port" => 9002 ),
    "ri2-3" => ( "host" => "192.168.7.102", "port" => 9003 ),
    "ri2-4" => ( "host" => "192.168.7.102", "port" => 9004 ),
    "ri3-0" => ( "host" => "192.168.7.103", "port" => 9000 ),
    "ri3-1" => ( "host" => "192.168.7.103", "port" => 9001 ),
    "ri3-2" => ( "host" => "192.168.7.103", "port" => 9002 ),
    "ri3-3" => ( "host" => "192.168.7.103", "port" => 9003 ),
    "ri3-4" => ( "host" => "192.168.7.103", "port" => 9004 ),
    "ri4-0" => ( "host" => "192.168.7.104", "port" => 9000 ),
    "ri4-1" => ( "host" => "192.168.7.104", "port" => 9001 ),
    "ri4-2" => ( "host" => "192.168.7.104", "port" => 9002 ),
    "ri4-3" => ( "host" => "192.168.7.104", "port" => 9003 ),
    "ri4-4" => ( "host" => "192.168.7.104", "port" => 9004 )
  )
)

    • restart lighttpd and point your browser to vm1.vm

BUT using the above configuration may give you really bad results since lighttpd has a really dumb round-robin scheduler that takes the fcgi dispatchers in the order that you specified (not by IP). So in the above set up, if you run 5 concurrent threads, they will all end up on one dispatch server and leave the other two idle. A quick solution is to re-order the list that you present the fcgi dispatchers...

Personal tools