North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: That pesky AS path corruption bug...

  • From: Daniel Senie
  • Date: Tue May 23 14:20:52 2000

Kai Schlichting wrote:
> 
> At Tuesday 01:26 PM 5/23/00 , Vijay Gill wrote:
> 
> >This is a hack.  We do not need more cruft added on, rather, what we need
> >is correct behavior.  The correct behavior being - if you see a corrupt/ a
> >malformed update from a peer, send a notify and drop the session.  Seems
> >fairly simple to me.
> >
> >The above suggestion of your fails in case of route servers.
> >
> >Insist on correct behavior, not on cruftery.
> 
> ...reading the host requirements RFC and its definition of the
> robustness principle: Why was the behavior above chosen over the
> more conceivable and robust "ignore (log) corrupted message, continue
> with regular operation" ? Given route flap dampening, dropping the BGP
> session is hardly the desirable outcome here. On that note: under
> what circumstances should or shouldn't the BGP session come back up
> without mnual intervention?

Well, let's see... the corrupted message was delivered over a TCP
session. That means the data sent is what the router at the other end
sent. Little likelihood of in-transit damage. So, we've got a router at
the remote end which is generating mangled messages. Now, do you trust
that the mangled message was the result of a single-event failure that
won't recur, or did that remote router suffer some sort of serious brain
cramp (software or hardware failure) which will result in additional bad
messages? At some point it makes sense to cut ones losses, declare the
remote device braindead, and route around it.

If a session goes down because of a BGP session problme (bad message),
it is worthwhile to either not bring the circuit back automatically, or
if automatic, implement a backoff mechanism as a form of local route
flap damping. Indeed, based on Pete's posting, this is exactly what is
supposed to happen.

If a session goes down because someone unplugged a cable and reconnected
it, there should not be a need for manual intervention. Similarly, if
you have a T1 get hit by lightning and the surge suppressors work right,
you may well see the line go down, then come back up.

-- 
-----------------------------------------------------------------
Daniel Senie                                        [email protected]
Amaranth Networks Inc.                    http://www.amaranth.com