North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Operate until failure

  • From: Nathan Stratton
  • Date: Mon Jan 08 09:23:44 2001

On 8 Jan 2001, Sean Donelan wrote:

> Is there any consistency among network operators how they operate
> their networks when they know a possibility of imminent failure
> exists?
>
> 1. Do you attempt to preserve service as long as possible, including
> running equipment to the point of destruction?

To extend as long as you can:

1) Power down as much hardware as possible
2) Pull all redundant cards
3) Pull fan trays

> 2. Do you attempt to minimize recovery time by shutting down equipment
> to a "safe" condition before failure?

Depends on the outage, if you think you can make it then you dont. Things
like pulling fan trays can give you a lot more run time, but may damage
hardware so you need to watch it. If it looks like you may make it you may
want to override your love voltage disconnects on your DC systems. It may
toast your batteries, but if it will get you through an outage it may be
worth it.

> If you are running a database/transaction oriented system, I would expect
> you want to put the database into a stable condition.  On the other hand,
> if you are operating mostly communication equipment, you would want to
> leave it operating as long as possible.

What I like to do is shutdown the redundant database so you know you have
something to fall back on. You then run the other database into the ground.

> I'm aware of a variety of proprietary software shutdown programs associated
> with UPS vendors.  But I'm wondering do any "open standards" exist for
> initiating soft shutdowns?

It very much depend on what you are doing. I like having the control
over what I kill in my network. Of course the best plan is to never
let the above happen, but I don't care how redundant your system, if you
have been in this business a long time you will reach a crash event.
Knowing how to deal with it can extend the event a long time.


><>
Nathan Stratton				CTO, Exario Networks, Inc.
[email protected]                     [email protected]
http://www.robotics.net                 http://www.exario.net