Incast
From RAD Lab
Contents |
DCMetro: Studying TCP Througput Collapse (Incast) in Datacenters [Fall 2008 - Present]
Project Members
- Faculty
- Randy Katz
- Anthony Joseph
- Scott Shenker
- Postdocs
- Rean Griffith
- Students
- Yanpei Chen
- Junda Liu
- Bin Dai
- Gunho Lee
- Daekyeong Moon
- Matei Zaharia
- Ariel Rabkin
- Andy Konwinski
Objective
Study TCP Throughput Collapse to understand causes, remedies, implications for possible ddos attacks and defences, influence of queueing stratgies by switches and the consequences for the design of large-scale networks and network testbeds.
Summary
TCP Throughput Collapse (also referred to as Incast) is a pathological response in TCP implementations that results in gross under-utilization of link capacity in certain many-to-one communication configurations. This link under-utilization is caused in part by packet losses at the switch and the resulting undesireable synchronization across the many (senders) as they respond to packet losses taking timeouts in lock-step.
Distributed storage systems are the canonical example of systems affected by Incast. The pattern of synchronized reads that characterize the interactions between a storage client and storage servers as the client request blocks striped across multiple storage servers, create the "right" conditions for reproducing this phenomenon. This phenomenon also affects other applications, e.g., web-search and Map Reduce.
The goals of this project are:
- to better understand the dynamics of TCP when this phenomenon occurs
- to identify the root cause of this phenomenon
- to determine whether this exposes additional vulnerabilites of TCP to a new class of denial of Service attack
- to develop techniques (signatures) for detecting this pathological behavior across TCP streams involved in many-to-one communication patterns
- to explore transport level and application level techniques that mitigate/solve this problem
Publications
Work in Progress
- List of possible solutions
- Reduced TCP_RTO_MIN (Active).
- Randomized initial rto and reduced TCP_RTO_MIN (Active).
- Retransmission rto multiplier = 1, 1.5, default 2 (Proven doesn't work, only one timeout at a time - Jan 2009).
- Randomized retransmisison rto multiplier (Proven doesn't work, only one timeout at a time - Jan 2009).
- Solution @ scale
- DETER: up to ~40 nodes (Active).
- DETER + Hadoop (Active).
- EC2: up to ~100s of nodes in VM (Incast still exist, VM doesn't solve all problems, more graceful degradation).
- Insights on the problem
- Tracing insights (Active, presented at OSDI WiP).
- Future stuff
- Better srtt and rto computation - backed by control theory (Future).
- Fairness with regard to original TCP (Future).
- Performance against background traffic (Future).
- Responding to flash congestion (Future).
- Parameter sensitivity analysis (Future).
Feedback - RAD Lab Retreat Winter 2009
- Repeat experiments for Hadoop 0.18 (currently using 0.17, major improvements to shuffling stage in 0.18).
- Larger Hadoop sort input size - capture average effect of HDFS balancing and failed tasks.
- TCP trace with Hadoop.
- Error bars for everything.
- Compare the amount of retransmissions before and after OS changes.
- Repeat experiments for BSD as well as Linux?
- Use Kristal's machine learning stuff to get Incast signature.
- Use Jonathan's datacenter emulator to evaluate.
- Look into application level solutions to complement kernel modifications of TCP stack.
Reading List
- Van Jacobson, Michael Karels, Congestion Avoidance and Control, http://ee.lbl.gov/papers/congavoid.pdf
- Kevin Fall, Sally Floyd, Simulation-based Comparisons of Tahoe, Reno, and SACK TCP, http://enl.usc.edu/~cs551/readings/papers/FloydFall.pdf
- Cheng Jin, David X. Wei, Steven H. Low, FAST TCP: Motivation, Architecture, Algorithms, Performance, http://www.cs.rutgers.edu/~rmartin/teaching/fall04/cs552/readings/jin04.pdf
- R Ludwig, K Sklower, The Eifel Retransmission Timer, http://iceberg.cs.berkeley.edu/papers/Ludwig-Eifel-Xmit/eifel_xmit_timer.pdf
- R Ludwig, RH Katz, The Eifel Algorithm: Making TCP Robust Against Spurious Retransmissions, http://iceberg.cs.berkeley.edu/papers/Ludwig-Eifel-Alg/eifel_algorithm.pdf
- Paul Barford, Mark Crovella, Critical Path Analysis of TCP Transactions, http://www.cs.bu.edu/faculty/crovella/paper-archive/sigcomm00.pdf
- R. Pan, B. Prabhakar, and A. Laxmikantha, QCN: Quantized Congestion Notification, http://www.ieee802.org/1/files/public/docs2007/auprabhakar-qcn-description.pdf
- Jeonghoon Mo, Richard J. La, V. Anantharam, J. Walrand, Analysis and Comparison of TCP Reno and TCP Vegas, http://www.eecs.berkeley.edu/~ananth/1999-2001/Richard/MoLaInfocom1999.pdf
