North American Network Operators Group Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical Re: Westnet and Utah outage
I made a private reply to Curtis on his posting earlier this week, and he gave a nice analysis and cc'd end2end-interest rather than nanog. For those that don't care to care to read all this, here's the summary: > Which would you prefer? 140 msec and 0% loss or 70 msec and 5% loss? So we get to choose between large delay or large lossage. Doesn't sound wonderful... I thought you folks in nanog might be interested, so with Curtis' permission, here's the full exchange, (the original posting by Curtis is at the at the very end). -- Jim Here's what I wrote: > In message <[email protected]>, Jim Forster writes: > > Curtis, > > > > I think these days for lots of folks the interesting question is not what > > happens when a single or a few high-rate TCPs get in equlibrium, but rather > > what happens when a DS-3 or higher is filled with 56k or slower flows, each > > of which only lasts for an average of 20 packets or so. Unfortunately, > > these 20 packet TCP flows are what's driving the stats these days, due I > > guess to the silly WWW (TCP per file; file per graphic; many graphics per > > page) that's been so successful. And Curtis's reply: > The analysis below also applies to just under 800 TCP flows each > getting 1/800th of a DS3 link or about 56Kb/s. The loss rate on the > link should be about one packet in 11 if the delay can be increased to > 250 msec. If the delay is held at 70 msec, lots of timeouts and > terrible fairness and poor overall performance will result. > > Do we need an ISP to prove this to you by exhibiting terrible > performance? If so, please speak to Jon Crowcroft. His case is 400 > flows on 4 Mb/s which is far worse, since delay would have to be > increased over 3 seconds or segment size reduced below 552. :-( > > > I could try to derive the results but I'm sure you or others would do > > better :-). How many of the packets in the 20 packet flow are at > > equilibrium? What's the drop rate? Hmmm, very simple minded analysis says > > that it will be large: expontential growth (doubling cwnd every ack) should > > get above best case pretty quickly, certainly within the 20 packet flow. > > Assume it's only above optimum once, then the packet loss rate is 1 in 20. > > Sounds grim. Vegas TCP sounds better for these reasons, since it tracks > > actual bw, but I'm not really qualified to judge. > > > > -- Jim > > > Jim, > > The end2end-interest thread was quite long and I didn't want to repeat > the whole thing. The initial topic was very tiny TCP flows of 3 to 4 > packets. That is a really bad problem, but should no longer be a > realistic problem once HTTP is modified to allow it to pick up both > the HTML page and all inline images in one TCP connection. > > Your example is quite reasonable. At 20 packets per flow, with no > loss you get 1, 2, 4, 8, 3 packets per RTT or complete transfer in > about 5 RTT. On average each TCP flow will get 20 packets / 5 RTT of > bandwidth until congestion of 4 packets/RTT (for 552/70 msec, this is > about 64 Kb/s). If the connection is temporarily overloaded by a > factor of 2, this must be reduced to 2 packets/RTT. If we drop 1 > packet in 20, roughly 35% of the flows go completely untouched > (0.95^20). Some 15% will drop one packet of the first 3 and timeout > and slow start, resulting in less than 20 packet / 3 seconds (3 > seconds >> 5*RTT). Some 60% will drop one packet of the 4th through > 20th, resulting in fast retransmit, no timeout, and linear growth in > window. If the 4th is dropped, the window is cut to 2, so next few > RTTs you get 2, 3, 4, 5, 3, or 8 RTTS (2 initial, 1 drop, 5 more). > This is probably not quite enough to slow things down. > > On a DS3 with 70 msec RTT and 1500 simultaneous flows of 20 packets > each (steady state such that the number of active flows remains about > 1500, roughly twice what a DS3 could support) you would need a drop > rate of on the order of 5% or more. Alternately, you could queue > things up, doubling the delay to 140 msec and give every flow the same > slower rate (perfect fairness in your example) and have a zero drop > rate. > > Which would you prefer? 140 msec and 0% loss or 70 msec and 5% loss? > Delay is good. We want delay for elastic traffic! But not for real > time - use RSVP, admission control, police at the ingress and stick it > on the front of the queue. > > In practice, I'd expect overload to be due to lots of flows, but not > enough little guys to overload the link (if so, get a bigger pipe, we > can say that and put it in practice). The overload will be due to a > high baseline of little guys (20 packet flows, or a range of fairly > small ones), plus some percentage of longer duration flows capable of > sucking up the better part of a T1, giving half a chance. It is the > latter that you want to slow down, and these are the ones that you > *can* slow down with a fairly low drop rate. > > I leave it as an exercise to the reader to determine how RED fits into > this picture (either one, my overload scenario or Jim's where all the > flows are 20 packets in duration). > > The 400 flows on 4 Mb/s is an interesting (and difficult) case. I've > suggested both allowing delay to get very large (ie: as high as 2 > seconds) and hacking the host implementation to reduce segment size to > as low as 128 bytes when RTT gets huge or cwnd drops below 4 segments, > holding the window to no less than 512 (4 segments) in hopes that fast > retransmit will almost always work even in 15-20% loss situations. > > Curtis > Curtis's original posting: > In order to get X bandwidth on a given TCP flow you need to have an > average window size of X * RTT. This is expressed in terms of TCP > segments N = (X * RTT) / MSS (or more correctly the segment size in > use rather than MSS). To sustain an average window of N segments, you > must ideally reach a steady state where you cut cwnd (current window) > in half, then grow linearly, fluctuating between 2/3 and 4/3 of the > target size. This would mean one drop in 2/3 N windows or DropRate in > terms of time is 2/3 N * RTT. In one RTT on average X * RTT amount of > data flows. In practice, you rarely drop at the perfect time, so the > constant 2/3 (call it K) can be raised to 1-2. Since N = (X * RTT) / > MSS, DropRate = K * X * RTT * X * RTT / MSS. Units are b/s * sec * > b/s * sec / b, or b. The DropRate expressed in bits can be converted > to seconds or packets (divide by X or by MSS). This type of analysis > is courtesy of the good folks at PSC (Matt, Jamshid, et al). > > For example, to get 40 Mb/s at 70 msec RTT and 4096 MSS, you get one > error about every 6 seconds (K=1) or 1 in 7,300 packets. If you look > at 56k Kb/s and 512 MSS you get a very interesting result. You need > one error every 66 msec or 1 error in 0.9 packets. This gives a good > incentive to increase delay. At 250 msec, you get a result of one > error in 11.7 packets (much better!). > > Another interesting point to note is that you need 3 duplicate ACKs > for TCP fast retransmit to work, so your window must be at least 4 > segments (and should be more). If you have a very large number of TCP > flows, where on average people get less than 1200 baud or so, the > delay you need to make TCP work well starts to exceed the magic 3 > second boundary. This was discussed ad nauseum on end2end-interest. > An important result is that you need more queueing than the delay > bandwidth product for severely congested links. Another is that there > is a limit to the number of active TCP flows that can be supported per > bandwidth. One suggestion to address the latter problem is to further > drop segment size if cwnd is less than 4 segments in size and/or when > estimated RTT gets into the seconds range. > > This analysis of how much loss is acceptable to TCP may not be outside > the bounds of an informational RFC, but so far none exists. > > Curtis
|