North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: trollage (Re: Akamai server reliability)

  • From: Joel Jaeggli
  • Date: Mon Nov 28 15:53:11 2005

On Mon, 28 Nov 2005, Chris Owen wrote:

As far as I can tell the only thing that will get a box replaced is if it
can't be booted/pinged.  We've pointed out dead CPU fans before (even on
the incoming replacement boxes) and they've never seemed to care.  If it
runs it runs.  If it doesn't they replace the entire box.
Having built a fair number of machines to live for 5 years or longer in data-centers I will never visit, there's relatively little that you want to triage onsite on a rackmount pc. Drives, in hot-plug enclosures and removable power supply modules are about it... Smart-hands are good for racking and stacking, swapping disks, recabling the oob, swapping media and so forth. It's not really a good use of someone else's time to have them performing experimental surgery on pc's. Much better to simply ship out another one and ship the old one back in the same box.

Decent modern 1u chassis still have sufficient airflow with a couple fans failed to remain adequately cool, further there's now enough sensors in a pc to be able to tell when you getting in trouble, rpm indicator for all the fans, intake processor and output temperature, thermal sensors in each of the drives etc. Our success-rate at indetifying machines before they fail has gotten substantially better over time.

Given all their redundancy I suppose that is probably the way to go.


Chris Owen                ~ Garden City (620) 275-1900 ~  Lottery (noun):
President                 ~ Wichita     (316) 858-3000 ~    A stupidity tax
Hubris Communications Inc ~       ~

Joel Jaeggli  	       Unix Consulting 	       joel[email protected]
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2