North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Tornados in Ashburn (Equinix affected)

  • From: Robert E. Seastrom
  • Date: Sun Sep 19 11:30:18 2004

Sean Donelan <[email protected]> writes:

>> 1) Good that they [seemed] to have maintained partial power.
>
> It would be interesting to find out what happened to the two UPSes that
> apparently failed.  Was it something that exceeded the design, i.e. a
> lightning strike greater than X joules?  Or something else?  Equinix
> tests the heck out of their systems, but there is always the potential
> for a problem.

Where did you hear this?  If it was posted to NANOG, I missed it.

>> 2) Good that they restored cooling [power to the blowers?] relatively
>> quickly. By the graph someone posted and their message, it looks like
>> their chillers were on an unaffected system, but their blowers weren't
>> [as in, were affected].
>
> The initial spike looks normal, although a bit bigger than is comfortable.
> Chiller plants and compressors take several minutes to reset and restart
> when the backup generators come online.  The storm may have had some
> impact on the recovery because the temperature appears to take a long time
> to stabilize.

If this is to be expected and normal, then a statement to that effect
("Some customers may note a transient temperature spike of as much as
10 degrees C on their equipment due to designed-in characteristics of
an unplanned transfer of the chiller plant to backup power") in the
customer announcement would have gone a long way towards allaying
fears and creating positive spin.  A statement that the "chillers are
OK", when your inlet temperature has just spiked 9 degrees and is
currently sitting six degrees high is simply disingenuous.

Anyway, based on my information (including a couple of phone calls at
the time), suggesting that everything was nominal would be an overly
charitable assessment of the situation.

>> 3) Good that they seemed to be able to bring together enough
>> knowledgeable folks quickly to resolve the problems that did occur
>> relatively quickly.
>
> Yep, whatever the problem, restoration that quickly tends to indicate
> their team was on the ball.  Stuff will always fail.  The real test is
> how quickly is it fixed.

Absolutely.  In case it was not clear in my original message, let me
state for the record:

1) I don't have a problem with facilities being screwed up due to Acts
of God that are outside of the design parameters of the facility.  If
an Airbus on short final to Runway 19R at Dulles magically fell out of
the sky on top of Equinix, that would just be spectacularly bad luck,
not Equinix's fault.

1a) In the words of a friend of mine who grew up in Texas, regarding
tornadoes: "The odds of being in the path are actually quite low; the
consequences of being in the path are extremely high".  An F2 tornado,
while perhaps not impressive to our friends from the Great Plains,
is capable of causing substantial damage.

1b) No substitute for site diversity if your project is important
enough to justify the cost.

2) Under the circumstances, I think the Equinix staff did an excellent
job of bringing things under control quickly.  I'm sure glad this
happened during the day and not at night or on a weekend when due to
cost-cutting measures they have maybe one tech, two max, on duty.

3) I believe that the statements made by Equinix to its customers so
far, are outside the acceptable and expectable envelope of positive
spin to which Sean alluded in a previous message.  We're paying
customers, and when things go south we deserve frankness and full
disclosure, not a pep talk.

                                        ---Rob