North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: NANOG 40 agenda posted

  • From: Paul Vixie
  • Date: Mon Jun 04 03:39:58 2007

two replies here.  i ([email protected]) said:

> > quagga ospf6d works great, and currently lacks only a health check API.

Donald Stahl <[email protected]> answered:

> Health checks are unfortunately the most important aspect of a LB for some
> people.

understood.

> Can you elaborate on where you use ECMP and specifics about your
> implementation that might interest people?

i could, but joe abley already did, and i wouldn't want to plagiarize him.
plz see <http://www.isc.org/pubs/tn/index.pl?tn=isc-tn-2004-1.html>.

---

Colm MacCarthaigh <[email protected]> answered:

> If you're load-balancing N nodes, and 1 node dies, the distribution hash
> is re-calced and TCP sessions to all N are terminated simultaneously. 

i could just say that since i'm serving mostly UDP i don't care about this,
but then i wouldn't have a chance to say that paying the complexity and bug
and training cost of an extra in-path powered box 24x365.24 doesn't weigh
well against the failure rate of the load balanced servers.  somebody could
drop an anvil on one of my servers twice a day (so, 730 times per year) and
i would still come out ahead, given that most TCP traffic comes from web
browsers and many users will click "Reload" before giving up.  then there's
CEF which i think keeps existing flows stable even through an OSPF recalc.
finally, there's the fact that we see less than one server failure per month
among the 100 or so servers we've deployed behind OSPF ECMP.

i know a lot of people who get paid well for building and selling and
supporting Extra Powered Boxes, and a lot of other people who will never
get fired for buying one... but that doesn't make it right.