North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: San Francisco Power Outage

  • From: Owen DeLong
  • Date: Tue Jul 24 22:55:02 2007

On Jul 24, 2007, at 4:57 PM, Patrick Giagnocavo wrote:

On Jul 24, 2007, at 6:54 PM, Seth Mattinen wrote:

I have a question: does anyone seriously accept "oh, power trouble" as a reason your servers went offline? Where's the generators? UPS? Testing said combination of UPS and generators? What if it was important? I honestly find it hard to believe anyone runs a facility like that and people actually *pay* for it.

Sad that the little Telcove DC here in Lancaster, PA, that Level3 bought a few months ago, has weekly full-on generator tests where 100% of the load is transferred to the generator, while apparently large DCs that are charging premium rates, do not.

I am not familiar with the operational details of 365 Main, but, I suspect that
they, like most datacenters, probably do have weekly generator and transfer
test procedures.

However, there are lots of things that can go wrong that are not covered by
generators and transfer tests:

It is possible to cascade fail a power distribution system in a number of
ways. It is possible for someone to connect things out of phase during a
maintenance procedure in such a way that everything is fine until a
transfer occurs, then, all hell breaks loose (ever seen what happens
when a large CRAC unit starts trying to run backwards because the
3 Phase rotation is out of order?)

There are also things that can go wrong in the transfer process (like
putting the UPS and Generators on the bus together some degrees
out of phase).

Most of these things become far more likely and far harder to avoid as
the amount of power and the number of units in the system increases.

I'm not defending the situation at 365 Main. I don't have any first hand
knowledge.  I'm just saying that the mere fact that they are dark for
several hours today does not necessarily mean that they don't do
weekly full-on generator tests.

I have no idea what the root cause of today's outage is.  I will be
interested in hearing from any credible source as to any actual details,
but, I'm betting that right now, any such credible source is a bit busy.