North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: One-element vs two-element design

  • From: Scott McGrath
  • Date: Sat Jan 17 14:47:44 2004

I personally favor the N+1 design model as it allows maintenance to be
performed on network elements without causing outages which makes the
customers happy.

In many instances you can leverage the N+1 model to share the load between
the devices thereby increasing network capacity.  As an addtional benefit
in the event of a element failure your network degrades gracefully rather
than failing hard and requiring a "all hands" operation to get it back
online.  This tends to reduce your operational costs for your network even
though your implementation cost is higher so over the lifetime of your
network the overall cost is lower.  i.e. service contracts can be NBD
rather than 24x7x2.

The N+1 model also takes into account the simple fact that stuff breaks!.
I was reading the FIPS standards for machine room design one day and an
entire page was devoted to "ALL EQUIPMENT WILL FAIL EVENTUALLY" this is a
lesson which is often forgotten.

This is why commercial airliners have multiple engines even though the
system is less reliable overall than a well designed single engine craft
the failure of a single component does not entail the catastrophic failure
of the entire system.  (there are exceptions to this but the overall
concept does work).

In the end it comes down to reliable vs resilient network.  s in a
reliable network components fail infrequently but they have catastrophic
failure modes in a resilient network component failure is taken as a given
but the overall system reliability is much higher than a reliable network
since a component failure does not equal a functional failure.


                            Scott C. McGrath

On Fri, 16 Jan 2004 [email protected] wrote:

> One key consideration you should think about is the ability to perform
> maintenance on redundant devices in the N+1 model without impacting the
> availability of the network.
>
> Brent
>
>
>
>
> Timothy Brown <[email protected]>
> Sent by: [email protected]
> 01/16/2004 10:14 PM
>
>
>         To:     [email protected]
>         cc:
>         Subject:        One-element vs two-element design
>
>
>
> I fear this may be a mother of a debate.
>
> In my (short?) career, i've been involved in several designs, some
> successful,
> some less so.  I've recently been asked to contribute a design for one of
> the
> networks I work on.  The design brings with it a number of challenges, but
> also, unlike a greenfield network, has a lot of history.
>
> One of the major decisions i'm being faced with is a choice between
> one-element
> or two-element design.  When I refer to elements, what I really mean to
> say
> is N or N+1.  For quite some time now, vendors have been improving
> hardware
> to the point where most components in a given device, with the exception
> of
> a line card, can be made redundant.  This includes things like routing and
> switching processors, power supplies, busses, and even, in the case of
> vendor
> J and several others, the possibility of inflight restarts of  particular
> portions of the software as part of either scheduled maintenance or to
> correct
> a problem.
>
> I have always been traditionally of the school of learning that states
> that
> it is best to have two devices of equal power and on the same footing,
> and,
> in multiple site configurations, four devices of equal power and equal
> footing.
> I feel like a safe argument to make is N+1, so that is the philosophy that
> I tend to adopt.  N+2 or N...whatever doesn't seem to add a lot of
> additional
> security into the network's model of availability.  This adds complexity,
> but
> I prefer to think of this in terms of,  "Well, I can manage software or
> design
> complexity in my configurations, but I can't manage the loss of a single
> device which holds my network together."  Now I must view this assertion
> in
> the context of better designed hardware and cheap spares-on-hand.
>
> Of course, like many other folks, I have tried to drink as deeply as I can
> from the well of knowledge.  I've perused at length Cisco Press' High
> Availability Network Fundamentals, and understand MTBF calculations and
> some of the design issues in building a highly available network.  But
> from
> a cost perspective, it seems that a single, larger box may be able to
> offer me
> as much redundancy as two equally configured boxes handling the same
> traffic
> load.  Of course, there's that little demon on my shoulder, that tells me
> that I could always lose a complete device due to a power issue or short,
> and then i'd be up a creek.
>
> We have a history of adopting the N+1 model on the specific network i'm
> talking about, and it has worked very well so far in the face of
> occassional
> software failures by a vendor we occassionally have ridiculed here on
> nanog-l.
> However, in considering a comprehensive redesign, another vendor offers
> significantly more software stability, so i'm re-evaluating the need for
> multiple devices.
>
> My mind's more or less already made up, but i'd like to hear the design
> philosophies of other members of the operational community when adopting a
> N+1 approach.  In particular, i'd love to hear a catastrophic operational
> failure which either proves or disproves either of the potential options.
>
> Tim
>
> ObDisclaimer:  Please contact me off-list if you're okay with your
> thoughts
> on this matter being published in a book targeted to the operations
> community.
>
>
>
>