North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

[fwd] Rats take down Stanford ...

  • From: Paul Ferguson
  • Date: Tue Oct 22 09:38:55 1996

A follow-up thought on redundancy issues.

- paul

[snip]


>Date: Mon, 21 Oct 1996 12:54:05 -0700 (PDT)
>From: [email protected]
>Subject: RISKS DIGEST 18.54

[snip]

>
>Date: Fri, 18 Oct 96 11:03 EST
>From: William Hugh Murray <[email protected]>
>Subject: Re: Rats take down Stanford ... (RISKS-18.53)
>
>PGN's request for redundancy brings to mind the story of the infrastructure
>computer center in Trumbull, Connecticut.  It is an old story but bears
>repeating.
>
>Seems that a squirrel got into a transformer and brought down the external
>power supply.  The UPS kicked in, engine generators came on line, and the
>center operated in this mode for about an hour and a half.  At the end of
>that time the external power was restored.  The external power, the UPS, and
>the engine generators went inot a deadly embrace.  The whole thing came down
>and would not come back up.
>
>I take two lessons from this.  First, redundancy adds some complexity and a
>lot of redundancy adds a lot of complexity.  At some point the redundancy
>begins to introduce failure modes and failure events that would not have
>exited in its absence.  There is an upper bound to such redundancy.
>
>Second, test redundant systems through to resumption of normal operations.
>In this case, the operators had tested to ensure that the redundant systems
>would come online in the event of a failure of the primary system.  They had
>not tested to see what would happen when the primary system was restored to
>normal operation.
>
>Who would have even thought about it?  I confess that I would not have.
>
>William Hugh Murray, New Canaan, Connecticut
>

[snip]

- - - - - - - - - - - - - - - - -