North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Route Supression Problem

  • From: Jack Bates
  • Date: Wed Mar 12 07:55:01 2003

Unless useful to others, feel free to just reply off-list.

Background:

Tuesday (yesterday) morning around 1am, I got a phone call from one of my
transit customers(which seems more like a dream). I, sadly, didn't have the
router they are on logging to a server, so it's impossible for me to see
exactly what happened. Here's what I have. They received a minor spike in
traffic going to them. My router shows the last BGP peer reset about that
time, so this could be me sending the global table. His bandwidth then drops
to 0 for almost exactly 30 minutes (MRTG isn't an exactly graph). My guess
(authoratative answer) was the customer flapped their routes once too many
times and was suppressed by both of my providers, as I seem to recall the
penalty heal rate is in 30 minute increments.

First issue is, am I right? If I am, then I need to develop ways to limit
the damage done to my customer. Is there a way to setup route supression
just under what most people use so that I can have client fix the problem
and then clear the suppress on my network to allow them to come back up
immediately just under the suppress threshold? Another possibility, although
I've not seen reference to it, since the customer only transits through my
network and depends on my redundancy, is it possible to hold his routes in
the tables and keep advertising them out unless they are down for a set time
period (ie, ignore flaps, but drop them if he's down 15-30 minutes)?

I've never seen this issue. I was aware supression was possible when I first
started learning BGP, and so I have never risked bouncing my peers more than
three times in a day, and at that point usually quit playing until the next
week. When my peers flap due to DDOS attacks, BGP never stabalizes fully or
my providers have protected my networks (though I haven't seen how 69.8/18
will react in this scenario which doesn't have a shorter prefix at the
peer).

My customer is thinking of multi-homing again after this. Of course, it
wouldn't have saved the customer. The reason they left multi-homing is that
their network is in the same building and they only have one BGP router. I
don't think multiple paths would have saved them.

Opinions? Suggestions? Options?

-Jack

~We now return you to the 69/8 threads