North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Persistent BGP peer flapping - do you care?

  • From: Susan Hares
  • Date: Fri Jan 18 21:38:38 2002

Brian:

Thank-you for your 2 cents.  I'm gathering all the input until
Sunday night.  I really appreciate your comments.  I'll summarize
all the input to the list at that time, and suggest some ideas.

I'll try to boil all the input on this problem into a document that
I can post to IDR and NANOG.

Sue

PS - I'm away from email from now until Monday am. Thanks nanog folks!!



At 07:30 PM 1/17/2002 -0500, Dickson, Brian wrote:

Here's my two cents...

A good rule of thumb (possibly from RFC 822) is, be liberal in what you
accept and strict in what you send.

When applied to BGP, I would suggest that any implementation should choose a
canonical form for constructing updates, but a parser that allows for
rule-bending without rule-breaking.

On the issue of existing vendor implementations, and how to build the specs
to prevent meltdowns:

I would suspect that during implementation, brand C routers were the victims
during testing, and perhaps the change was made to avoid that happening.

The current state of affairs is very much like the classical game-theory
"prisoner's dilemna".

The new spec should have two goals - discourage any implementation which can
lead to meltdowns, and encourage strict adherence to the spec. The latter
can be achieved via the former, in fact, if the mechanisms are well chosen.

My suggestion would be, rather than a back-off of resetting BGP sessions,
that first attempt strict interpretation (to insulate against completely
insane routers), and then loose interpretation. The model is "Fool me once,
shame on you, fool me twice, shame on me."

On first receiving a bad update, reset. If upon re-establishing the session,
the same bad update is heard, drop the bad update but keep the session up
(along with the messages back, etc.)

One additional optional behaviour I would suggest - look at the AS path
and/or path length and/or announcing router IP address. If heard from the
originator, drop the session (and either keep it down, or try one more time
before requiring operator intervention); it may be the case that only these
conditions strictly require a reset, and that all other situations may only
require the "ignore bad routes" behaviour.

Resetting BGP more than a small, finite number of times is, IMHO, a bad
idea. After all, BGP is a stateful protocol, and state changes should be
triggered deterministically, even if that requires operator input.

Brian Dickson
Velocita