North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Persistent BGP peer flapping - do you care?

  • From: Christopher A. Woodfield
  • Date: Thu Jan 17 15:10:41 2002

This has been bandied about before, but one should note that the "drop the 
peer if an error is received" is only really effective if the session that 
initiated the error does not propogate it. Most Cisco routers running common IOS 
images not only do not drop the session, but pass along the bad prefix, which 
leads to the occasional bad route dropping peering sessions on most of 
the Enterasys(*) routers on the planet.

I guess the main question is what is considered an "error" - if the peer starts 
obviously misbehaving, then yet, drop the peer. But don't drop the peer due to an 
invalid prefix that most likely did not originate on that router - it would be much 
better for the 'net as a whole to just drop the bad prefix and carry on. Maybe a 
algorithm could be built in where the peer could be dropped if the number of bad 
prefixes exceeds a set threshold...

In short, the "drop the session when you get a bad prefix" only works its intended 
purpose when every router that speaks BGP does this. If that can't be had, we 
should really revisit the spec in that regard.

-Chris

(*) among other vendors; it was a customer's Enterasys router that got my attention 
the last time it happened...

-Chris

On Thu, Jan 17, 2002 at 02:27:25PM -0500, Susan Hares wrote:
> 
> NANOG --
> 
> We are finalizing a revision of the BGP specification.  It is in
> last call for your new BGP specification.
> 
> This BGP revision is to match the bgp in deployment.
> One part of the specification remains, a fix for a problem called
>  "persistent bgp flapping".   We urgently need input from nanog folks
> on what is deployed.
> 
> 
> Here's the description "presistent bgp flapping"
> from the BGP specification:
> 
> >If a BGP speaker detects an error, it shuts down the connection
> >and changes its state to Idle. Getting out of the Idle state
> >requires generation of the Start event.  If such an event is
> >generated automatically, then persistent BGP errors may result
> >in persistent flapping of the speaker.
> 
> 
> 1) Do any of the ISPs see the persistent bgp peer flapping now?
> 
>    Does anyone from the ISP community ever experience
>    anything like this BGP persistent peer flapping?
>    The solution is to have an exponential backoff in
>    the rate of sending the Opens.
> 
> 
>    If you have seen this, how many routers did this persistent
>    bgp peer flapping impact?  (Can you give a % of your routers or a
>    total number)?   How often does this impact your routers?
> 
> 
> 2) Is this feature on in your machine by default?
>    If not, do you configure the exponential rates?
> 
> 
>  3)Do you track if your routers are in this state?
>    How do you track if your routers are in this state?
> 
> 
> Sue Hares
> 

-- 
---------------------------
Christopher A. Woodfield		[email protected]

PGP Public Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xB887618B