North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: San Francisco Power Outage

  • From: Stephen Wilcox
  • Date: Wed Jul 25 07:16:44 2007

On Tue, Jul 24, 2007 at 11:57:37PM +0000, Paul Vixie wrote:
> 
> [email protected] (Seth Mattinen) writes:
> 
> > I have a question: does anyone seriously accept "oh, power trouble" as a 
> > reason your servers went offline? Where's the generators? UPS? Testing 
> > said combination of UPS and generators? What if it was important? I 
> > honestly find it hard to believe anyone runs a facility like that and 
> > people actually *pay* for it.
> > 
> > If you do accept this is a good reason for failure, why?
> 
> sometimes the problem is in the redundancy gear itself.  PAIX lost power
> twice during its first five years of operation, and both times it was due
> to faulty GFI in the UPS+redundancy gear.  which had passed testing during
> construction and subsequently, but eventually some component just wore out.

I had an issue with exactly that 7 or 8 years ago at Via Networks.. the switchover gear shorted and died horrifically leading to an outage that lasted well through the night (something like 16hours in total). Being on a Friday evening it was difficult to get people on site promptly.

The lesson learned was 'the big switch' .. a huge thing that took the weight of two adults to move it, but did mean that should something similar occur we could transfer the whole building power manually directly to the generator.

I doubt such a beast would scale to the power loads on a large datacentre tho, but then they are generally not on a single grid/UPS feed.

Steve