Re: Extreme congestion (was Re: inter-domain link recovery)

North American Network Operators Group

Re: Extreme congestion (was Re: inter-domain link recovery)

From: Fred Baker
Date: Thu Aug 16 16:30:18 2007

Authentication-results: sj-dkim-3; [email protected]; dkim=pass (si g from cisco.com/sjdkim3002 verified; );
Dkim-signature: v=0.5; a=rsa-sha256; q=dns/txt; l=4212; t=1187284533; x=1188148533; c=relaxed/simple; s=sjdkim3002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; [email protected]; z=From:=20Fred=20Baker=20<[email protected]> |Subject:=20Re=3A=20Extreme=20congestion=20(was=20Re=3A=20inter-domain=20 link=20recovery) |Sender:=20; bh=JXY65GooSDPXz4nIM6lDCASnV8o+K13oa0rN5ZM1UWE=; b=pteDSO5pTnbUFG636rTMjgOl1mToZLHj4SbyzYPUaWJQv53KLx3CbMB1xLy5pVxzpXlmZeN7 HJN2XBSa1Un8kD+ycC+1sssDp6WWIz9nhpc8BScww7h57QohLvKEwy5l;

On Aug 16, 2007, at 7:46 AM, <[email protected]> wrote:

In many cases, yes. I know of a certain network that ran with 30% loss for a matter of years because the option didn't exist to increase the bandwidth. When it became reality, guess what they did.
How many people have noticed that when you replace a circuit with a higher capacity one, the traffic on the new circuit is suddenly greater than 100% of the old one. Obviously this doesn't happen all the time, such as when you have a 40% threshold for initiating a circuit upgrade, but if you do your upgrades when they are 80% or 90% full, this does happen.

well, so lets do a thought experiment.

First, that infocomm paper I mentioned says that they measured the variation in delay pop-2-pop at microsecond granularity with hyper- synchronized clocks, and found that with 90% confidence the variation in delay in their particular optical network was less than 1 ms. Also with 90% confidence, they noted "frequent" (frequency not specified, but apparently pretty frequent, enough that one of the authors later worried in my presence about offering VoIP services on it) variations on the order of 10 ms. For completeness, I'll note that they had six cases in a five hour sample where the delay changed by 100 ms and stayed there for a period of time, but we'll leave that observation for now.

Such spikes are not difficult to explain. If you think of TCP as an on-off function, a wave function with some similarities to a sin wave, you might ask yourself what the sum of a bunch of sin waves with slightly different periods is. It is also a wave function, and occasionally has a very tall peak. The study says that TCP synchronization happens in the backbone. Surprise.

Now, let's say you're running your favorite link at 90% and get such a spike. What happens? The tip of it gets clipped off - a few packets get dropped. Those TCPs slow down momentarily. The more that happens, the more frequently TCPs get clipped and back off.

Now you upgrade the circuit and the TCPs stop getting clipped. What happens?

The TCPs don't slow down. They use the bandwidth you have made available instead.

in your words, "the traffic on the new circuit is suddenly greater than 100% of the old one".

In 1995 at the NGN conference, I found myself on a stage with Phill Gross, then a VP at MCI. He was basically reporting on this phenomenon and apologizing to his audience. MCI had put in an OC-3 network - gee-whiz stuff then - and had some of the links run too close to full before starting to upgrade. By the time they had two OC-3's in parallel on every path, there were some paths with a standing 20% loss rate. Phill figured that doubling the bandwidth again (622 everywhere) on every path throughout the network should solve the problem for that remaining 20% of load, and started with the hottest links. To his surprise, with the standing load > 95% and experiencing 20% loss at 311 MBPS, doubling the rate to 622 MBPS resulted in links with a standing load > 90% and 4% loss. He still needed more bandwidth. After we walked offstage, I explained TCP to him...

Yup. That's what happens.

Several folks have commented on p2p as a major issue here. Personally, I don't think of p2p as the problem in this context, but it is an application that exacerbates the problem. Bottom line, the common p2p applications like to keep lots of TCP sessions flowing, and have lots of data to move. Also (and to my small mind this is egregious), they make no use of locality - if the content they are looking for is both next door and half-way around the world, they're perfectly happen to move it around the world. Hence, moving a file into a campus doesn't mean that the campus has the file and will stop bothering you. I'm pushing an agenda in the open source world to add some concept of locality, with the purpose of moving traffic off ISP networks when I can. I think the user will be just as happy or happier, and folks pushing large optics will certainly be.

Follow-Ups:
- RE: Extreme congestion (was Re: inter-domain link recovery) michael.dillon
- Re: Extreme congestion (was Re: inter-domain link recovery) Mikael Abrahamsson

References:
- inter-domain link recovery Chengchen Hu
- Re: inter-domain link recovery Stephen Wilcox
- Extreme congestion (was Re: inter-domain link recovery) Sean Donelan
- Re: Extreme congestion (was Re: inter-domain link recovery) Fred Baker
- Re: Extreme congestion (was Re: inter-domain link recovery) Sean Donelan
- Re: Extreme congestion (was Re: inter-domain link recovery) Fred Baker
- Re: Extreme congestion (was Re: inter-domain link recovery) Sean Donelan
- Re: Extreme congestion (was Re: inter-domain link recovery) Fred Baker
- RE: Extreme congestion (was Re: inter-domain link recovery) michael.dillon

Prev by Date: Re: Extreme congestion (was Re: inter-domain link recovery)
Next by Date: Re: Extreme congestion (was Re: inter-domain link recovery)
Date Index
Thread Index
Author Index
Historical