North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Spanning tree melt down ?

  • From: blitz
  • Date: Fri Nov 29 01:45:59 2002


Smells like it to me...sounds like they said, "HALP" to Cisco, and Cisco said, "Clean out the warehouse, we've got a live one!"

At 16:08 11/28/02 -0600, you wrote:

I'm still failing to see why this required a $3M forklift of new equipment
to correct the problem. Was this just Cisco sales pouncing on someone's
misfortune as a way to push new stuff?

On Thu, 28 Nov 2002, Stephen J. Wilcox wrote:

>
> Heh, so they kept bolting stuff on and a failure somewhere caused a spanning
> tree change which because of over complexity and out of date config was unable
> to converge.
>
> Ah yes, occam also applies to switch topology :)
>
> Steve
>
> On Fri, 29 Nov 2002, Simon Lyall wrote:
>
> >
> > On Thu, 28 Nov 2002, Garrett Allen wrote:
> > > speculating on cause and effect, my first bet would that someone turned off
> > > spanning tree on a trunk or trunks immediately prior to the flood. my next
> > > bet would be a babbling device - i've seen an unauthorized hub on a flat
> > > layer 2 net basically shut the network down. it was after a power hit.
> > > when we found the buggar and power cycled it, all was well. i don't think
> > > that the researcher was the culprit. more likely the victim.
> >
> > This article had some more information:
> >
> > http://www.nwfusion.com/news/2002/1125bethisrael.html
> >
> > This slashdot article also seems to have some details:
> >
> > http://slashdot.org/comments.pl?sid=46238&cid=4770093
> >
> > Text as follows:
> >
> > I contacted Dr. John D. Halamka to see if he could provide more detail on
> > the network outage. Dr. Halamka is the chief information officer for
> > CareGroup Health System, the parent company of the Beth Israel Deaconess
> > medical center. His reply is as follows: "Here's the technical explanation
> > for you. When TAC was first able to access and assess the network, we
> > found the Layer 2 structure of the network to be unstable and out of
> > specification with 802.1d standards. The management vlan (vlan 1) had in
> > some locations 10 Layer2 hops from root. The conservative default values
> > for the Spanning Tree Protocol (STP) impose a maximum network diameter of
> > seven. This means that two distinct bridges in the network should not be
> > more than seven hops away from one to the other. Part of this restriction
> > is coming from the age field Bridge Protocol Data Unit (BPDU) carry: when
> > a BPDU is propagated from the root bridge towards the leaves of the tree,
> > the age field is incremented each time it goes though a bridge.
> > Eventually, when the age field of a BPDU goes beyond max age, it is
> > discarded. Typically, this will occur if the root is too far away from
> > some bridges of the network. This issue will impact convergence of the
> > spanning tree. A major contributor to this STP issue was the PACS network
> > and its connection to the CareGroup network. To eliminate its influence on
> > the Care Group network we isolated it with a Layer 3 boundary. All
> > redundancy in the network was removed to ensure no STP loops were
> > possible. Full connectivity was restored to remote devices and networks
> > that were disconnected in troubleshooting efforts prior to TACs
> > involvement. Redundancy was returned between the core campus devices.
> > Spanning Tree was stabilized and localized issues were pursued. Thanks for
> > your support. CIO Magazine will devote the February issue to this event
> > and Harvard Business School is doing a case study."
> >
> >
> > --
> > Simon Lyall. | Newsmaster | Work: [email protected]
> > Senior Network/System Admin | Postmaster | Home: [email protected]
> > ihug, Auckland, NZ | Asst Doorman | Web: http://www.darkmere.gen.nz
> >
> >
>
>
>