North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: BGP keepalive/holdtime at GigE exchange

  • From: Clayton Fiske
  • Date: Fri Jan 12 16:09:53 2001

On Fri, Jan 12, 2001 at 03:23:51PM -0500, Deepak Jain wrote:

> I think the argument is one of stability. BGP is supposed to be stable for
> days/weeks on end normally. Making your internal network too sensitive to
> external changes destabilizes your network and those who connect to you.
> 
> If a BGP session with one peer resets once every three days, and you peer
> with them at a few places, at most you are talking about a service
> degradation for about 5-10 minutes as say 1/3 of your packets are resent
> or dropped (assuming you peer in three places, etc). 180 seconds is
> nothing for a router with many peering sessions and a reasonable traffic
> load. 

With regard to your earlier comments about busy routers "pausing"
BGP, perhaps this is something that can be investigated at a vendor
software level. I would think keepalives (of any variety) should rank
fairly high on the food chain in terms of CPU precedence. If this isn't
the case already, why not? I don't know how true it is anymore, but I
recall a few years back having to deal with some routers which got
bogged down with OSPF updates to the point that they kept resetting
perfectly stable links (or the other end did) due to keepalives not
being processed in a timely manner. In the interest of stability, I
would certainly want keepalives to be processed ahead of routing
updates. After all, it's not as though they even represent a significant
percentage of the total workload on the CPU, even when you reach a
reasonably high number of links. And if your links keep resetting due
to route churn, you've got a self-perpetuating problem.

> The bigger concern is IF a peer is dropping a session that often, *what*
> is wrong with their router? I am very afraid of routers that *randomly*
> timeout and re-peer with no good reason.

In this case, I would expect a NOC with proper monitoring of peering
sessions to take notice and initiate an investigation into the problem.

-c