North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Baltimore train tunnels (was Re: Vulnerbilities of Interconnection)

  • From: Sean Donelan
  • Date: Sat Sep 07 21:51:02 2002

On Fri, 6 Sep 2002 [email protected] wrote:
> You also have the problem of cascading failures.  Just because there
> are redundant paths and alternate peering locations does not mean
> those facilites have the bandwidth to handle all the redirected
> traffic.  If A gets swamped you go to B if the redrected traffic is to
> much for B then you go to C and so on - each time the amount of
> traffic increases and the avialble bandwidth decreases.  According to
> the analysis I've seen and run on the the Baltimore incident this is
> the jest of how a few cut lines rippled across the Internet.  I would
> think Alex's scenario would have a bigger impact than that incident.

For some reason, I guess since Baltimore is near Washington DC, this
incident seems to have captured the imagination of folks in Washington DC.
Although some brand-name providers were impacted by this incident, it had
minimal impact on other providers. Essentially every major Internet
exchange point has failed at one time or another. In the past, there has
been simultaneous failures in at least three different locations.

The problem with your analysis is that's not what happens on the Internet.

One of the current issues of Internet traffic engineering is traffic
doesn't roll over to alternate paths B or C when the primary path A
is congested.  This is a traditional design in the switched telephone
network, but not common in the Internet.  Internet traffic tends to
follow the "best" available route.

Unlike phone calls, TCP traffic doesn't occur in fixed bandwidth
increments. TCP traffic, 90% of Internet traffic, is elastic. By design,
TCP adjusts the traffic rate to keep the bottleneck congested.  As the
bottleneck moves, traffic reacts by increasing or decreasing the rate to
match the available capacity.  This feedback occurs independently of what
is happening on nearby traffic paths.  Even if there is available
capacity on elsewhere, the current Internet design is not very good at
using it.  Some people view this as an inefficient use of available
capacity, other people view it as a self-protective mechanism.

In today's Internet, the type of cascading failure you postulated probably
won't happen.  The design goal of the Internet is not to keep every part
of the network operating under every condition, but failures in part of
the network should not disrupt other parts of the network.

That's why during the Baltimore train tunnel you saw some providers with
severe problems in parts of their network, but other providers didn't
experience any slowdowns in their networks.  I wouldn't be surprised if
a few people even experienced an improvement in their traffic that day.

There are vendors trying to sell systems which will "steer" traffic
through alternate paths seeking to avoid congestion.  In addition there
are things like IEPREP which are seeking to bypass the congestion feedback
controls for selected traffic.  It is unclear to me what impact these
will have on Internet traffic during a crisis.  It is possible these
improvements will in fact make the Internet more brittle.