North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Global BGP - 2001-06-23

  • From: lucifer
  • Date: Sun Jun 24 18:19:14 2001

Brett Frankenberger wrote:
> 
> > Out of curiosity - did anyone see a duration of significanlt instability
> > in the global routing tables on Saturday afternoon? Without violating NDA,
> > all I can say is that it resembled a historic event involve a bad route,
> > Ciscos, and Bay routers (only this time, it was a bad route, Ciscos, and
> > <X> vendor whom I cannot name but is being soundly beaten with wet noodles
> > to resolve the issue). The bad route, and instability, were seen across
> > all of our transit vendors (all "household" names of transit service).
> 
> Hmm ... why is <X> being beaten?  Was the problem reversed this time?
> 
> The only historic event I can recall involving a bad route, Cisco, and
> Bay (actually, events would be better, since it happened at least
> twice) was a case of (a) someone injecting a bad route, (b) the cisco
> at the other end accepting it in violation of the RFC, (c) ciscos
> passing that bad route all around the internet, all in violation of the
> RFC, (d) that route eventually hitting a cisco<->bay peering
> connection, and (e) the Bay (although the problem wasn't limited to
> Bay, as gated, and possible other implementations as well, behaved the
> same way) properly sending a NOTIFY and taking down the BGP session, as
> required by the RFC.

A) Ciscos flap sessions, according to the only reports I've heard.
B) <X> routers were crashing, either due to the bug, or the session resets.
   Thus, <X> is being flogged. I have reports of at least one <Y> having
   problems, as well.
C) I would post the BugID, but the only source I have is under NDA. However,
   having now heard this much in a public forum (IE, not covered), I can say
   "Invalid AS path data bug".

> It only took two major outages before Cisco fixed the problem.  (The
> BGP advertisement was posted to NANOG both times, as was the BugID the
> second time.)  

I have the guilty announcement, but again, it's under NDA. However, I can
say that we are now seeing this announcement from all of our upstreams,
non-blocked, so it appears that they fixed the origionating point.

> So if this is the same issue, Cisco would be the vendor to flog,
> although assuming they didn't re-introduce it, the flogging might more
> correctly be directed at providers still running code old enough to
> have this particular problem.

I would flog Cisco as well, but A) they have a bug on it already, and B)
we're not using Ciscos for our core (note: this is my personal email, and
I am not speaking for my employer; however, this is publically documented
on my employers website, so it's not NDAed).

> Both my transits (Bay on my end, Cisco on the other end) made it
> through just fine, though.  (This time.  The last two times it
> happened, the cisco's on the other end happily passed the invalid route
> to me and the Bay on my end happily dropped the BGP session, and this
> was repeated ad infinitum until the bogus route was removed from the
> other end.)

I have no data on Bay; my apologies if this wasn't clear. Bay was *only*
being referenced as a historical point of note. No attempt at FUD, and my
apologies if anyone read it that way.
-- 
***************************************************************************
Joel Baker                           System Administrator - lightbearer.com
[email protected]              http://www.lightbearer.com/~lucifer