North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Followup British Telecom outage reason

  • From: Sean Donelan
  • Date: Sat Nov 24 14:17:21 2001

On Sat, 24 Nov 2001, Neil J. McRae wrote:
> I'd be surprised if it was the GSR, and in anycase that doesn't
> absolve anyone. If it was a software issue- why wasn't the software
> properly tested? Why was such a critical upgrade rolled out across
> the entire network at the same time? It doesn't add up.

It appears to be yet another CEF bug.  If you want to use a GSR
you are stuck using some version of IOS with a CEF bug.  The
question is which bug do you want.  Each version of IOS has
a slightly different set.  Several US network providers have also
been bitten by CEF bugs too.

While trying to fix one set of bugs, BT upgraded of their network.
I'm not sure if they were upgrading at 9am in the morning, or had
upgraded earlier and the bug finally came out under load at 9am.
When the BT network melted down, Cisco suggested installing a
different version of IOS, which had previously been tested.  At
noon, BT found the new version had an even worse bug, sending packets
out the wrong interface.  It was until 2200 (13 hours later), BT and
Cisco found a version of IOS which stablized the network.  "Stablized"
not fixed.  The running version of IOS still has a bug, but it isn't
as severe.