North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

RE: Global BGP - 2001-06-23 - Vendor X's statement...

  • From: Sean Donelan
  • Date: Tue Jun 26 22:26:51 2001

On Tue, 26 June 2001, "Richard A. Steenbergen" wrote:
> On 26 Jun 2001, Sean Donelan wrote:
> > There will always be cases where Vender A thinks they are correct and
> > Vendor B thinks they are correct, and they differ.  And you are
> > correct, either the sender has done something wrong or the receiver
> > has done something wrong, hence the Internet motto.
> 
> But there there should be no room for debate, one side is right and the
> other side is wrong. If there is really a grey area, the solution is to
> fix the wording of the standards document, not to try and overlook the
> problem.

I'm not proposing we overlook the problem.  However, software is very
bad at deciding who is right and who is wrong.  Other than malware, most
vendor software does not deliberately send bad data.  The software, or
rather the programmer who wrote the software, thought the program was
sending correct data.  Later when humans looked at the data, humans
decided the data was wrong and fixed the software.

What do we do between the time the software makes an error, and time the
humans can interven?

Have the software, with no human oversight, nuke everything?  The Blue
Screen Of Death may be a very "safe" for software to do when it encounters
an error.  However, it is not a very good thing for system availability.

I agree error handling is "hard."

Aborting the entire BGP session makes the Internet more brittle than
necessary. In the hours/days between the software sending the data, and
the humans fixing it, the network was hurting a lot more than you would
expect from a single bad route.  The constant cycle of abort, reset, route
flap was an amazing multiplier effect of one bad route.

> I agree that in this case it is possible to have ignored the bad AS PATH
> and drop the route without disturbing the session originating the bad
> information. This is one specific example could probably have been handled
> better with a non-fatal notification (with big red lights and buzzers).
> However, it was unacceptable for that router to propagate the bad
> information to others.

I agree, you must have both sides (conservative send, and liberal receive).

Sending bad data is not acceptable.  Cisco should not send bad data.

Crashing/aborting when you receive bad data isn't acceptable either.  Bad
data happens, Vendor X should not abort if it had other options.

Sometimes there is no alternative besides aborting.  However, the RFC makes
aborting a requirement.  There are errors BGP implementations could recover
(with blinking red lights and loud buzzers).  The RFC should give the
option of continuing to implementations.

I was following the standard isn't a good reason to crash.  If following
the standard causes the Internet to flap like a hummingbird for a day, 
we need to get the standard changed (as well as fix the existing
implementations).

These are not mutually exclusive goals.

   1) Modify the standard so an error does not have as much impact worldwide
   2) Fix the current implementations

Yes, a pedestrian may have the right of way in the crosswalk.  But proving
your point by having the semi-truck flatten you isn't very smart.