North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

RE: redundancy [was: something about arrogance]

  • From: Phil Rosenthal
  • Date: Tue Jul 30 12:54:07 2002

I have in the past single-homed to Level(3) and Verio, each in their own
facility in NC.
In that time, both carriers had about 1 solid hour a month of solid
downtime (some months were worse, some were better). Some of the outages
were on the order of 8 solid hours (verio) or 4 hours (level3).

We did not run HSRP with Level3, so it may be difficult to guarantee the
uptime of one gige handoff... But we ran HSRP with verio, and of all the
outages (about 20 of them) -- Maybe two of them were avoided because of
HSRP.

Other than that, it was all downtime.

At this point,  I couldn't conceive single-homing to any uplink anymore.

--Phil

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
Pedro R Marques
Sent: Tuesday, July 30, 2002 6:23 AM
To: [email protected]
Cc: [email protected]
Subject: redundancy [was: something about arrogance]



Brad writes:
 >        I'm probably demonstrating my ignorance here (and my stupidity

in 
 > stepping into a long-standing highly charged argument), but I'm 
 > completely missing something.  For reasons of redundancy & 
 > reliability, even if you were to buy bandwidth in only one location, 
 > wouldn't you want to buy it from at least two different providers?
 
 >        If you buy bandwidth from two different providers at two 
 > different locations, this would seem to me to be a good way to 
 > provide backup in case on provider or one location goes 
 > Tango-Uniform, and you could always backhaul the bandwidth for the 
 > site/provider that is down.

Several other posters have mentioned reasons why redundancy between 2 
different connections to separate providers are not, in most situations,

the preferable aproach but i would like to add another point/question...

When considering redudancy/reliability/etc it is important to think 
about what kind of failures do you want to protect against vs cost of 
doing so.

It is my impression, from reading this list and tidbits of gossip, that 
the most common causes of failure are:
- link failure
- equipment failure (routers mostly), both software and hardware
- configuration errors

All of those are much more frequent than the failure of an entire ISP (a

transit provider). It is expected, i believe, of a competent ISP to 
provide redudancy both within a POP and intra-POP links/equipment and 
its connections to upstreams/peers.

As such, probably the first level of redundancy that a origin AS 
(non-transit) would look at would be  with the intent to protect from 
failures of its external connectivity link and termination equipment 
(routers on both ends).

To do so, one can look at:
- 2 external links to distinct providers
- 2 external links to the same provider

While i can't speak to the economics part of the equation (although i 
would expect it to be cheaper to buy an additional link than connect to 
a different provider) from a point of view of restoration, protecting a 
path with an alternate path from the same provider is certainly an 
aproach that gives you much better convengence times.

This comes from the fact that in terms of network topology, the distance

between 2 links to the same upstream is much shorter than 2 links to 
different upstreams. While, if you protect a path with an alternate path

to the same ISP you can expect convergence to occur within the IGP 
convergence times of your provider, with 2 different providers you need 
global BGP convergence to occur.

This gets to be longer dependent on how topologically distant your 2 
upstreams are... for instance attempting to protect a path to an ISP 
with very wide connectivity with a protection path from one with very 
limited connectivity would be a particularly bad case as you would have 
to wait for the path announced by the larger ISP to be withdrawn n times

from all its peering points and the protection path to make its way 
through in replacement.

It is counter-intuitive to me what i perceive to be the standard 
practice of attempting to multi-home to 2 distinct providers by 
origin-only ASes... from several points of view: convergence times, load

on the global routing system, complexity of management, etc, dual 
connectivity to different routers of the same provider (using distinct 
physical paths) would seem to me to make more sense.

Unless the main concern is that the upstream ISP fails entirely... which

given the fact that it tends to have frontpage honors on the NYTimes 
this days does not apear to be an all to common occurence (i mean 
operationally, not financially - clarification added to dispel potential

humorous remarks).

So, my question to the list is, why is multi-homing to 2 different 
providers such a desirable thing ? What is the motivation and why is it 
prefered over multiple connections to the same upstream ?

Is the main motivation not so much reliability but having a shorter 
as-path to more destinations ? This would apear to me to be a clear 
advantage since that doesn't necessarily reflect in better qualitify of 
interconnection.

My apologies in advance if these seem to be stupid questions...

thanks,
  Pedro.