North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

RE: BGP keepalive/holdtime at GigE exchange

  • From: Deepak Jain
  • Date: Fri Jan 12 15:29:40 2001

I think the argument is one of stability. BGP is supposed to be stable for
days/weeks on end normally. Making your internal network too sensitive to
external changes destabilizes your network and those who connect to you.

If a BGP session with one peer resets once every three days, and you peer
with them at a few places, at most you are talking about a service
degradation for about 5-10 minutes as say 1/3 of your packets are resent
or dropped (assuming you peer in three places, etc). 180 seconds is
nothing for a router with many peering sessions and a reasonable traffic
load. 

Its not exciting, but the other peer's customers are just as screwed. If
the whole fabric went goes down, a good dampening policy at your
internal-> BR routers will keep the instability from influencing your
core. 

The bigger concern is IF a peer is dropping a session that often, *what*
is wrong with their router? I am very afraid of routers that *randomly*
timeout and re-peer with no good reason.

Most networks insert new routes at internal/CR/other routers that are
automatically distributed to their borders, this way internal route
changes do not require resetting of external peers to take effect. 

So, maybe I am misunderstanding your concern, why micromanage BGP timers
on your routers when a reasonably sized network may have more than 1000
external peering sessions; and each router on both sides has different
loading characteristics that are not stable? 

inbound prefix limits are my personal interest in a lot of these per
neighbor configs and even, then a big customer signed on or leaving a peer
causes the prefix limits to get hit or be meaningless; I only recommend
them for use with peers that have fat finger engineers working at 4am. :)

Deepak Jain
AiNET




On Fri, 12 Jan 2001, Lane Patterson wrote:

> 
> Hmm, I know there are a lot of overburdened BR's out there, but
> since this is set on a per-neighbor basis, there should at least
> be room for some selective optimization.  It seems a bit crazy
> to think that each time there's a BR maintenance/reboot at an IXP,
> peers will continue to send to the bit bucket in the sky for 180+
> seconds.
> 
> > -----Original Message-----
> > From: Deepak Jain [mailto:[email protected]]
> > Sent: Friday, January 12, 2001 11:48 AM
> > To: Lane Patterson
> > Cc: '[email protected]'
> > Subject: RE: BGP keepalive/holdtime at GigE exchange
> > 
> > 
> > 
> > 
> > The problem I have seen with setting BGP timeouts that low is 
> > when peering
> > with overloaded or slow/old routers. Often they will "pause" their BGP
> > activity while they are actively peering or repeering across their
> > internal or external network. The low times will then cause 
> > more timeouts
> > before the fabric has stablized. 
> > 
> > Deepak Jain
> > AiNET
> > 
> > On Fri, 12 Jan 2001, Lane Patterson wrote:
> > 
> > > 
> > > Hmm, many folks didn't seem to understand the context here.
> > > 
> > > fast-external-fallover doesn't apply if a peer BR across a GigE
> > > exchange dies...you've still got link on your Gig port, so there
> > > is no link level indication of failure.
> > > 
> > > tweaking tcp timers is not the right approach...BGP explicitly
> > > has a keepalive for this exact purpose, when peering dies but
> > > your interface stays up.
> > > 
> > > the best non-radical suggestion so far is to simply tweak your
> > > keepalive to 10 and holdtime to 30 seconds, to bring this in line
> > > with the granularity of direct-connected peer interface or 
> > IGP metrics.
> > > 
> > > Do people do this?  Do people have problems doing this?
> > > 
> > > Do any folks do less than this on their eBGP peers, and at
> > > what tradeoff expense.
> > > 
> > > This is the old issue of finding the right operationally sane
> > > timeouts, not too high, not too low.  The defaults clearly
> > > seem too high, yet I haven't seen many cases where folks set 
> > > these down :-)
> > > 
> > > Cheers,
> > > -Lane
> > > 
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Lane Patterson [mailto:[email protected]]
> > > > Sent: Thursday, January 11, 2001 10:08 PM
> > > > To: '[email protected]'
> > > > Subject: FW: BGP keepalive/holdtime at GigE exchange
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > I am looking for operational BCP feedback on common practice 
> > > > for tweaking
> > > > down BGP holdtime/keepalive across GigE exchange points, 
> > since a peer
> > > > could go down on the other side of the GigE switch without a 
> > > > corresponding adjacency change seen on your BR.  The thought is
> > > > to make down peers known as fast thru a GigE exchange as 
> > they would 
> > > > be over a POS private peer interface.
> > > > 
> > > > The current defaults are pretty gross, and much worse than the
> > > > ISIS hello and interface keepalive defaults of 10 seconds.
> > > > 
> > > > IOS12.x: neighbor [ip-address | peer-group-name] timers 
> > > > keepalive holdtime
> > > > 	holdtime: default 180 seconds	
> > > > 	keepalive: default 60 seconds
> > > > 
> > > > http://cco.cisco.com/univercd/cc/td/doc/product/software/ios12
> > > > 1/121cgcr/ip_r
> > > > /iprprt2/1rdbgp.htm#xtocid8553
> > > > 
> > > > JunOS 4.2: 
> > > > 	holdtime: default 90 seconds
> > > > 	keepalive: default one third of holdtime
> > > > 		
> > > > https://www.juniper.net/techpubs/software/junos42/swconfig-rou
> > > > ting42/html/bg
> > > > p-summary13.html#1015669
> > > > 
> > > > Cheers,
> > > > -Lane
> > > > 
> > > > Lane Patterson <[email protected]>
> > > > Equinix, Inc.
> > > > 
> > > 
> > > 
> > 
> > 
> 
>