North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: multihoming without BGP

  • From: Paul A Vixie
  • Date: Wed Jun 11 02:05:56 1997

> Of course, the downsides of using the interface-default hack are:
> 
> 1) it does not guarantee shortest path for the packets (unless someone
> has hacked together an lbnamed version that talks to gated and sees
> which interface has a shorter path to customer <x> based on number of
> AS hops before it answers the DNS query).

Shorter AS paths are a silly way to choose a path for a connection.  *IF*
BGP carried all the end-to-end bandwidth and delay stuff that EIGRP does,
it might be possible to make this decision intelligently.  But only if all
nets described by a routing element were internally homogeneous -- that is,
only one exit gateway rather than different exit gateways in each region.

None of these requirements hold in the case of BGP.  While BGP is a fine
way to route packets, it's a horrid way to select paths for connections.

The right answer, as I said when I first described this to the NANOG list
a while back, is in an upcoming product.  "ifdefault" is the free part of
the idea and it's something I hope to see system vendors supporting.

> 2) It uses a separate address for each interface (not important for a
> single box, but a room full of boxes, say, 50 of them, 3-way homed at
> a single site...  hmm, that's 100 extra addresses you didn't want to
> use).  I suspect that upstream providers will not be thrilled to hand
> out more address space if they discover it is being put to such
> inefficient use.

I don't think so.  If you are using PA space, then the fact that you might
have to burn 3X as much PA space isn't going to bother any particular P in
the P=3 in your supposition.  Listen up folks -- if you can't get routable
PI space, you have to make do with what you CAN get.

But there's no guarantee that you need separate addresses per home page.
If you don't count Lynx or Mosaic as part of your target audience, then you
can depend on the "Host:" keyword sent in queries by *all* modern browsers.

But if you do need to support old Lynx and Mosaic, you can assign all 100
PA's as virtual interfaces on a single "ifdefault" machine.

Remember that the machine with the "ifdefault" hack just runs a squid cache
in accelerator mode.  Your web servers are all highly custom and probably
very fragile, you should leave them alone.  The "ifdefault" box is just a
front end -- it becomes, or adds a hop before, your exit gateway.

> 3) I have not looked at the code, but if it is on a per-interface
> basis, based on the addresses in the packets, that would seem to
> suggest that it might not like BSDI 3.0's virtual host scheme (adding
> IP addresses to the loopback port and then proxy-arping them onto the
> wire).  If this is correct, that would mean you would have to use a
> different physical machine for each customer.  Of course, on this
> point I'm purely speculating.

Indeed you are, sir!  The interfaces that matter are the uplink ones, not
the downlink ones.  A SYN packet comes in on some interface, and what
"ifdefault" is trying to do is make sure your SYN-ACK goes out to the exit
gateway that's reachable via that same interface.  The local end of the
TCP connection is bound to a local socket, we're just trying to get the
"remote" end of each TCP connection bound to a reasonable upstream gateway
rather than having to use a single system-wide default or run full BGP.

> 4) It puts the onus for fail-over on the DNS server, which means one
> is going to be using very short TTL.

People who multihome do that anyway.

> 5) Unless (#1), (#4) implies that fail-over will be manual.  Is your
> Emacs ready to rock and roll on 50 zone files?

No it isn't but there are four or five packages in the /contrib subdir of
BIND that can robohack your zone files to this end.

> I admire Paul's hack; it is spiffy for what it is, but I would hardly
> promulgate it as an advised way to multihome without using BGP.

But, but... I *DID* it.  I didn't just write the code (actually I didn't
write much of the code, Ted Lemon wrote most of it) -- I ran this stuff on
a high volume pornography site for three months and the credit card
transaction dollar-o-meter was never as busy or as steady, before or since.

I know it sounds hacky.  But so did ethernet's exponential backoff.  The
thing that makes this hack work is counter intuitive but the success is
measurable.

That's two folks who have come out today and said "well that's no damn good"
without trying it.  I'm surprised, NANOG members usually have a more positive
attitude.