North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: "Simple" Multi-Homing ? (was Re: CIDR Report)

  • From: Chris Williams
  • Date: Tue May 16 15:00:38 2000

> Those are examples of contingencies that should be covered in your
> service level agreement.  The SLA should hold the provider responsible
> for any loss of income, recovery costs, or whatever, should they be the
> ones to screw up.  Assuming you trust the provider to honour the SLA
> you've worked out with them then your level of risk is mitigated even if
> it's not done exactly in the way you might prefer under ideal
> circumstances -- we are talking about (hopefully exceptional) events
> here, after all!  If you don't trust your provider to honour their
> agreements then I'd humbly suggest you find one you can trust!  ;-)

Most SLAs I've seen, at least for smaller customers, are of the type "if
we're down for a day, you get a free week", which means in general your
maximum remedy for an outage is the cost of a T1 for a month. I think it
is pretty plausible that a company which only needed a T1 of bandwidth
could lose a lot more than $1500 worth of business if they were down for
a day or two.

> All but the last are also examples where basic link-level redundancy
> will help to avoid total outages.  You don't need an ASN and full BGP
> route peering just to remain connected when your T1 goes down!  Please
> let's solve the right problem here!

All three were examples of miscommunication causing someone at the
provider to intentionally suspend or terminate service. It would hardly
matter how many links you had to the provider when they chose to shut
you down.

> If such action is specified in your contract then you've accepted that
> risk and you should mitigate it appropriately (eg. by regularly testing
> and securing your servers!).  I'd hope that if you did have redundant
> routing then your other provider would also cut you off for the same
> reason and at approximately the same time!

The situation I was trying to highlight was one where such an incident
occurs, and the customer quickly and appropriately responds, but one of
their providers overreacts and at some point during the process suspends
service. It is really not about who is right, but about the fact that
any given provider is run by a small group of humans, and that any given
group of humans is to some degree unpredictable. If you only have one
provider, it only takes one human mishandling a situation to take you
offline.

I would hope that most reasonable providers would _not_ cut off a
customer immediately if they were found to be a source of misbehavior,
but first ask them politely to fix the problem (with the exception, of
course, of immediately blocking any traffic that was actively
interfering with someone else's operation). If you have discovered a way
to make a machine guaranteed and perfectly secure, I might reconsider
this position. ;P

> I think there's another alternative that's being missed here too that'll
> satisfy the majority of needs of quite a few people, if not most.  It
> should be trivial to obtain only the minimum necessary address space
> from both providers and truly multi-home the servers requiring
> redundancy!  For outgoing connections you simply flip the default route
> on each server as necessary (perhaps using automated tools) and for
> incoming connections you just put multiple A RRs in your DNS for each
> service requiring redundancy.  Load balancing opportunities spring to
> mind here too!

Although I agree that this is a possible solution, I think at some point
it would become awefully hard to manage -- also, it only addresses a
subset of the situations requiring multihoming.

Do you know of any software to help implement this type of solution? I
can imagine how to script up the default-route swapping pretty easily on
a Unix box, but AFAIK it would likely require a reboot under NT, and I'm
not sure how you would go about automating it even then..

Maybe a good way to go about it would be to set up a box to do reverse
NAT for incoming connections to either set of server IPs, and then
round-robin between IP spaces for outgoing connections? I think this
could be set up with IPF under *BSD/Linux, I'm not familiar enough with
NAT under IOS to know how hard it would be to do with a Cisco router..
This would have the advantage of simplifying the server configurations,
and there should really be something in the way of a firewall/filter in
front of them anyhow.

The only real disadvantage I can see of this solution is that the
load-balancing is not topology-sensitive -- on the other hand, if you
weren't going to receive full views anyway, or if both providers end up
connecting to the tier-1 backbones in the smae place, this is a moot
point, and you are actually better off with round-robin load-balancing.