North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Spanning tree melt down ?

  • From: Stephen J. Wilcox
  • Date: Thu Nov 28 16:47:35 2002

Heh, so they kept bolting stuff on and a failure somewhere caused a spanning
tree change which because of over complexity and out of date config was unable
to converge.

Ah yes, occam also applies to switch topology :)

Steve

On Fri, 29 Nov 2002, Simon Lyall wrote:

> 
> On Thu, 28 Nov 2002, Garrett Allen wrote:
> > speculating on cause and effect, my first bet would that someone turned off
> > spanning tree on a trunk or trunks immediately prior to the flood.  my next
> > bet would be a babbling device - i've seen an unauthorized hub on a flat
> > layer 2 net basically shut the network down.  it was after a power hit.
> > when we found the buggar and power cycled it, all was well.  i don't think
> > that the researcher was the culprit.  more likely the victim.
> 
> This article had some more information:
> 
> http://www.nwfusion.com/news/2002/1125bethisrael.html
> 
> This slashdot article also seems to have some details:
> 
> http://slashdot.org/comments.pl?sid=46238&cid=4770093
> 
> Text as follows:
> 
>  I contacted Dr. John D. Halamka to see if he could provide more detail on
> the network outage. Dr. Halamka is the chief information officer for
> CareGroup Health System, the parent company of the Beth Israel Deaconess
> medical center. His reply is as follows: "Here's the technical explanation
> for you. When TAC was first able to access and assess the network, we
> found the Layer 2 structure of the network to be unstable and out of
> specification with 802.1d standards. The management vlan (vlan 1) had in
> some locations 10 Layer2 hops from root. The conservative default values
> for the Spanning Tree Protocol (STP) impose a maximum network diameter of
> seven. This means that two distinct bridges in the network should not be
> more than seven hops away from one to the other. Part of this restriction
> is coming from the age field Bridge Protocol Data Unit (BPDU) carry: when
> a BPDU is propagated from the root bridge towards the leaves of the tree,
> the age field is incremented each time it goes though a bridge.
> Eventually, when the age field of a BPDU goes beyond max age, it is
> discarded. Typically, this will occur if the root is too far away from
> some bridges of the network. This issue will impact convergence of the
> spanning tree. A major contributor to this STP issue was the PACS network
> and its connection to the CareGroup network. To eliminate its influence on
> the Care Group network we isolated it with a Layer 3 boundary. All
> redundancy in the network was removed to ensure no STP loops were
> possible. Full connectivity was restored to remote devices and networks
> that were disconnected in troubleshooting efforts prior to TACs
> involvement. Redundancy was returned between the core campus devices.
> Spanning Tree was stabilized and localized issues were pursued. Thanks for
> your support. CIO Magazine will devote the February issue to this event
> and Harvard Business School is doing a case study."
> 
> 
>  --
> Simon Lyall.                |  Newsmaster  | Work: [email protected]
> Senior Network/System Admin |  Postmaster  | Home: [email protected]
> ihug, Auckland, NZ          | Asst Doorman | Web: http://www.darkmere.gen.nz
> 
>