North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Extreme congestion (was Re: inter-domain link recovery)

  • From: Fred Baker
  • Date: Thu Aug 16 16:30:18 2007
  • Authentication-results: sj-dkim-3; [email protected]; dkim=pass (si g from cisco.com/sjdkim3002 verified; );
  • Dkim-signature: v=0.5; a=rsa-sha256; q=dns/txt; l=4212; t=1187284533; x=1188148533; c=relaxed/simple; s=sjdkim3002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; [email protected]; z=From:=20Fred=20Baker=20<[email protected]> |Subject:=20Re=3A=20Extreme=20congestion=20(was=20Re=3A=20inter-domain=20 link=20recovery) |Sender:=20; bh=JXY65GooSDPXz4nIM6lDCASnV8o+K13oa0rN5ZM1UWE=; b=pteDSO5pTnbUFG636rTMjgOl1mToZLHj4SbyzYPUaWJQv53KLx3CbMB1xLy5pVxzpXlmZeN7 HJN2XBSa1Un8kD+ycC+1sssDp6WWIz9nhpc8BScww7h57QohLvKEwy5l;



On Aug 16, 2007, at 7:46 AM, <[email protected]> wrote:
In many cases, yes. I know of a certain network that ran with 30% loss for a matter of years because the option didn't exist to increase the bandwidth. When it became reality, guess what they did.

How many people have noticed that when you replace a circuit with a higher capacity one, the traffic on the new circuit is suddenly greater than 100% of the old one. Obviously this doesn't happen all the time, such as when you have a 40% threshold for initiating a circuit upgrade, but if you do your upgrades when they are 80% or 90% full, this does happen.

well, so lets do a thought experiment.


First, that infocomm paper I mentioned says that they measured the variation in delay pop-2-pop at microsecond granularity with hyper- synchronized clocks, and found that with 90% confidence the variation in delay in their particular optical network was less than 1 ms. Also with 90% confidence, they noted "frequent" (frequency not specified, but apparently pretty frequent, enough that one of the authors later worried in my presence about offering VoIP services on it) variations on the order of 10 ms. For completeness, I'll note that they had six cases in a five hour sample where the delay changed by 100 ms and stayed there for a period of time, but we'll leave that observation for now.

Such spikes are not difficult to explain. If you think of TCP as an on-off function, a wave function with some similarities to a sin wave, you might ask yourself what the sum of a bunch of sin waves with slightly different periods is. It is also a wave function, and occasionally has a very tall peak. The study says that TCP synchronization happens in the backbone. Surprise.

Now, let's say you're running your favorite link at 90% and get such a spike. What happens? The tip of it gets clipped off - a few packets get dropped. Those TCPs slow down momentarily. The more that happens, the more frequently TCPs get clipped and back off.

Now you upgrade the circuit and the TCPs stop getting clipped. What happens?

The TCPs don't slow down. They use the bandwidth you have made available instead.

in your words, "the traffic on the new circuit is suddenly greater than 100% of the old one".

In 1995 at the NGN conference, I found myself on a stage with Phill Gross, then a VP at MCI. He was basically reporting on this phenomenon and apologizing to his audience. MCI had put in an OC-3 network - gee-whiz stuff then - and had some of the links run too close to full before starting to upgrade. By the time they had two OC-3's in parallel on every path, there were some paths with a standing 20% loss rate. Phill figured that doubling the bandwidth again (622 everywhere) on every path throughout the network should solve the problem for that remaining 20% of load, and started with the hottest links. To his surprise, with the standing load > 95% and experiencing 20% loss at 311 MBPS, doubling the rate to 622 MBPS resulted in links with a standing load > 90% and 4% loss. He still needed more bandwidth. After we walked offstage, I explained TCP to him...

Yup. That's what happens.

Several folks have commented on p2p as a major issue here. Personally, I don't think of p2p as the problem in this context, but it is an application that exacerbates the problem. Bottom line, the common p2p applications like to keep lots of TCP sessions flowing, and have lots of data to move. Also (and to my small mind this is egregious), they make no use of locality - if the content they are looking for is both next door and half-way around the world, they're perfectly happen to move it around the world. Hence, moving a file into a campus doesn't mean that the campus has the file and will stop bothering you. I'm pushing an agenda in the open source world to add some concept of locality, with the purpose of moving traffic off ISP networks when I can. I think the user will be just as happy or happier, and folks pushing large optics will certainly be.