North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Energy consumption vs % utilization?

  • From: Valdis.Kletnieks
  • Date: Tue Oct 26 14:45:30 2004

On Tue, 26 Oct 2004 13:52:51 EDT, "Gregory (Grisha) Trubetskoy" said:

> average your servers are 98% underutilized, you are wasting a lot of 

Remember in your analysis to include premature hardware failure due to too many
power cycles...

A server can *easily* "on average" be running at only 20-30% of capacity,
simply because requests arrive at essentially random times - so you have to
deal with the case where "average" over a minute is 20% of capacity for 600
hits (10/sec), but some individual seconds only have 1 hit, and others have 50
(at which point you're running with the meter spiked).

Time-of-day issues also get involved - you may need to have enough iron to
handle the peak load at 2PM, but be sitting mostly idle at 2AM. Unfortunately,
I've seen very few rack-mount boxes that support partial power-down to save
energy - if it's got 2 Xeon processors and 2G of memory, both CPUs and all the
memory cards are hot all the time...

There's also latency issues - if some CPUs on a node or some nodes in a cluster
are powered down, there is a timing lag between when you start firing them up
and when they're ready to go - so you need to walk the very fine line between
"too short a spike powers stuff up needlessly" (very bad for the hardware), and
"too much dampening means you get bottlenecked while waiting for spin-up".

(Been there, done that - there's a 1200-node cluster across the hall, and
there's no really good/easy way to ramp up all 1200 for big jobs and power down
800 nodes if there's only 400-nodes worth of work handy.  So we end up leaving
it all fired up and let the node's "idle loop" be "good enough")..

If it was as easy as all that, we'd all be doing it already.. :)

Attachment: pgp00002.pgp
Description: PGP signature