North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: IP failover/migration question.

  • From: Andrew Warfield
  • Date: Sun Jun 11 23:50:54 2006
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=JLlK6MSiRrCA40AIie7dHA/pM/fwOLDVgz6naoLfLZIpZ074PnEuCTxpCG1ig4fgIcx8bgdeUYfdrS8WRCxZpvpSpHTyCSOjiQ7h/iNG/QC2xThc4xJkLMUU6e/9Co1D806WSZ+cw9ZNP7xFdXP2U0BkhDNL7RUvVt5waQx4JU4=


> I'm trying to get a more clear understanding as to what is involved in
> terms of moving the IPs, and how fast it can potentially be done.

can we presume that separate ip spaces and changing dns, i.e. maybe
ten minutes at worst, is insufficiently fast?
Absolutely.  We are trying to explore the (arguably insane) idea of
failing things over sufficiently fast (and state-fully) that open
connections remain completely functional.

> I'm fairly sure that what I would like to do is to arrange what is
> effectively dual-homing, but with two geographically distinct homes:

uh, that kinda inverts what we normally mean by 'multi-homing'.
that's usually two upstream providers for a single site.
Yep, which is what I want -- It's just that the single site is going to move. ;)

Consider a traditional (single site) dual-homed situation, where I'm
not doing any kind of balancing across the links.  In that (my
understanding of) that case, I would use a private stub AS with the
two upstream links going to the common provider AS, and advertize a
change to the link weight on the backup link when I wanted a switch to
happen.  (Or if the primary failed this would presumably happen
automatically through it's link disappearing.)

In this new scheme, I want to make _everything_ redundant.  The backup
link is to a geographically distinct site, and all of the hosts in the
primary site are actively mirrored to the backup site: OS,
applications, TCP connection state and all.  So it's _kind of_ dual
homing -- two upstream links for a single (virtual) site.

...
i am sure others can come up with more clever hacks.  beware if
they're too clever.
I completely agree with your comments regarding clever hacks, which is
why I'm trying to draw analogy to dual-homing, a technique that's
known, trusted, and clearly not fraught with corner-cases and devilish
complexity. ;)  Seriously though, I'm trying to convince myself that
there is a reasonable approach here that is within the means of
datacenter operators and their ISPs, and would allow a switch with on
the order of seconds of reconfiguration time.

persistent tcp connections from clients would not fare well unless
you actually did the hacks to migrate the sessions, i.e. tcp serial
numbers and all the rest of the tcp state.  hard to do.
Since we move the entire OS, the TCP state goes with it.  We've done
this in the past on the local link by migrating the host and sending
an unsolicited ARP reply to notify the switch that the IP has moved to
a new MAC (http://www.cl.cam.ac.uk/~akw27/papers/nsdi-migration.pdf),
I think that order-of-seconds reconfiguration should allow the same
sort of migration to work at a larger scope.

well, you left of mention of us legislative follies and telco and
cable greed.  but maybe you can get away with a purely technical
question once if you promise not to do it again. :-)
Thanks!  And thanks everyone for the feedback -- incredibly helpful.
I'll try for follies and greed next time. ;)

a.