North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

redundancy [was: something about arrogance]

  • From: Pedro R Marques
  • Date: Tue Jul 30 05:25:42 2002


Brad writes:
> I'm probably demonstrating my ignorance here (and my stupidity in > stepping into a long-standing highly charged argument), but I'm > completely missing something. For reasons of redundancy & > reliability, even if you were to buy bandwidth in only one location, > wouldn't you want to buy it from at least two different providers?

> If you buy bandwidth from two different providers at two > different locations, this would seem to me to be a good way to > provide backup in case on provider or one location goes > Tango-Uniform, and you could always backhaul the bandwidth for the > site/provider that is down.

Several other posters have mentioned reasons why redundancy between 2 different connections to separate providers are not, in most situations, the preferable aproach but i would like to add another point/question...

When considering redudancy/reliability/etc it is important to think about what kind of failures do you want to protect against vs cost of doing so.

It is my impression, from reading this list and tidbits of gossip, that the most common causes of failure are:
- link failure
- equipment failure (routers mostly), both software and hardware
- configuration errors

All of those are much more frequent than the failure of an entire ISP (a transit provider). It is expected, i believe, of a competent ISP to provide redudancy both within a POP and intra-POP links/equipment and its connections to upstreams/peers.

As such, probably the first level of redundancy that a origin AS (non-transit) would look at would be with the intent to protect from failures of its external connectivity link and termination equipment (routers on both ends).

To do so, one can look at:
- 2 external links to distinct providers
- 2 external links to the same provider

While i can't speak to the economics part of the equation (although i would expect it to be cheaper to buy an additional link than connect to a different provider) from a point of view of restoration, protecting a path with an alternate path from the same provider is certainly an aproach that gives you much better convengence times.

This comes from the fact that in terms of network topology, the distance between 2 links to the same upstream is much shorter than 2 links to different upstreams. While, if you protect a path with an alternate path to the same ISP you can expect convergence to occur within the IGP convergence times of your provider, with 2 different providers you need global BGP convergence to occur.

This gets to be longer dependent on how topologically distant your 2 upstreams are... for instance attempting to protect a path to an ISP with very wide connectivity with a protection path from one with very limited connectivity would be a particularly bad case as you would have to wait for the path announced by the larger ISP to be withdrawn n times from all its peering points and the protection path to make its way through in replacement.

It is counter-intuitive to me what i perceive to be the standard practice of attempting to multi-home to 2 distinct providers by origin-only ASes... from several points of view: convergence times, load on the global routing system, complexity of management, etc, dual connectivity to different routers of the same provider (using distinct physical paths) would seem to me to make more sense.

Unless the main concern is that the upstream ISP fails entirely... which given the fact that it tends to have frontpage honors on the NYTimes this days does not apear to be an all to common occurence (i mean operationally, not financially - clarification added to dispel potential humorous remarks).

So, my question to the list is, why is multi-homing to 2 different providers such a desirable thing ? What is the motivation and why is it prefered over multiple connections to the same upstream ?

Is the main motivation not so much reliability but having a shorter as-path to more destinations ? This would apear to me to be a clear advantage since that doesn't necessarily reflect in better qualitify of interconnection.

My apologies in advance if these seem to be stupid questions...

thanks,
Pedro.