North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: FYI - unproven technology

  • From: bmanning
  • Date: Wed Oct 19 18:32:30 1994
  • Posted-date: Wed, 19 Oct 1994 15:31:27 -0700 (PDT)

Ken, I don't think it was Bob Metcalfe... Note this telling extract from
USENET of just over a year back.

>------- Start of forwarded message -------
>Newsgroups: comp.dcom.lans.ethernet
>Path: cronkite.cisco.com!decwrl!parc!wirish
>From: [email protected] (Wes Irish)
>Subject: Performance problems on high utilization Ethernets
>Message-ID: <[email protected]>
>Summary: High utilization Ethernet performance problems traced to controller implementation bugs
>Keywords: Ethernet, communications, interframe gap, IFG, collisions, controller, interface, packet loss, data link
>Sender: [email protected]
>Organization: Xerox PARC
>Date: 16 Oct 93 00:36:23 GMT
>Lines: 115
>
>For the past year or so I have been investigating performance problems
>on the Ethernets here at PARC.  This work has uncovered problems with a
>number of Ethernet controllers in common use today.  These low-level
>controller problems can lead to serious performance problems for many of
>the systems involved.  A full paper on this work, "Investigations into
>observed performance problems on high utilization Ethernet networks",
>will be released soon (initially as a PARC Blue & White report).  But,
>since I have been giving talks on this work and news of it has begun to
>hit the Internet, I feel that a should post a preliminary report in
>order to reduce speculation and to make sure that the facts are
>correctly stated.  Below is a short summary of some of the key facts and
>issues.
>
>The Ethernet specifications talk about making sure that transmitters
>enforce a 9.6 microsecond gap (IFG) between frames (packets).  This is
>straight forward in the case of a gap following a just completed good
>packet.  But, gaps following collision events are less straight forward.
>I do not want to debate the details of what is and is not "correct" in
>this case -- that is a discussion for another time and place.  The
>reality of the situation is that there are a number of controllers in
>wide-spread use on networks today that do not interoperate very well in
>the face of collisions.
>
>In general, the problems arise when the gap following a collision is too
>short for a particular implementation of a receiver.  In addition to
>uncovering controllers that simply generate short IFGs I have also
>uncovered a major implementation bug in a particular chip that injects
>short signal bursts onto the network.  These bursts can damage the IFG
>"enforced" by other machines.  Either way, the result is that same -- a
>short IFG preceding a packet which can result in a missed packet.
>
>It is important to note that when a controller misses a packet due to a
>short IFG THE FACT THAT THE PACKET WAS MISSED IS NOT DETECTED NOR
>REPORTED TO THE SYSTEM.  System and driver statistics will claim no
>packets lost (unless some are lost for other reasons).  Even most
>network analyzers are subject to the same undetected and therefore
>unreported packet loss.  I have resorted to using a digital oscilloscope
>to capture and analyze these events.
>
>Let me emphasize that these problems are almost exclusively related to
>dealing with collision events.  On a lightly loaded network, where
>collisions are few and far between, these problems are virtually
>non-existent.  But these problems do indeed come into play on moderate
>to heavily loaded networks.  Based on my observations a VERY ROUGH
>network load dividing line is about 25% load (using 0.1 or 1.0 sec
>samples).
>
>Here is an enumeration of some of the facts related to particular
>controllers that I have uncovered so far.  There may be problems with
>other controllers but they may not appear on the networks that I have
>inspected.
>
>Controller: Intel 82586
>Commonly found in: SUN 3's and SUN 4's (ie interfaces), many other
>machines
>Problem: Can generate a short IFG following a collision
>Cause: starts IFG timer on CS dropout
>
>Controller: Intel 82596
>Commonly found in: Network General Sniffer using Cogent interface card
>Problem: Will not hear packet unless preceding IFG is 4.6 usec or larger
>
>Controller: SEEQ 8003
>Commonly found in: Cisco MEC and MCI interfaces, older SGI (Silicon
>Graphics) including 4D/35 and Indigo (but not Indigo2)
>Problem: Can generate a short IFG following a collision
>Cause: Starts 9.6 usec timer at end of its on jam and not end of
>collision
>Problem: Generates 24 bit signal burst onto network following some
>collisions.  This burst lands in the IFG following the collision and
>will often result in two short IFGs resulting in other controllers
>missing the packet.  NB: this can happen even if the chip has nothing to
>transmit!
>
>Controller: AMD 7990 "LANCE"
>Commonly found in: SUN SPARCStation machines (SS-1, SS-1+, SS-2, SS-10,
>...), many DEC machines, Cisco/SynOptics routers, Cisco IGS, many other
>machines
>Problem: Will not hear packet unless preceding IFG is 4.1 usec or larger
>Cause: implementation state machine
>Problem: many other problems including lock-up, transmit gaps greater
>than 9.6 usec under load, etc.
>Fix: A new version of the controller, the 79C90 CLANCE, fixes many of
>these problems but is not in common use like the LANCE.
>
>Interface chip: AT&T T7213
>Commonly found in: SUN SPARCStation 10 and other newer SUN machines
>Problem: Will hold the collision (and kill data) sent to the controller
>chip across IFGs of roughly 1.0 usec or less.  It will also do this if a
>"manchester coding violation" is detected in a packet -- a job that
>should be left to the controller.
>
>
>The result of all of these implementation details is that it is very
>possible, even probable, to put together a network that results in
>"undetected" packet loss.  Packet loss rates of even less than 1% can
>result in performance hits as high as 80%, depending on a multitude of
>factors including the protocols and implementations being used.  I have
>clocked the potential packet drop rate at PARC due to these problems to
>be in the 1% - 5% range at times.
>
>I have been working with many of these vendors for a number of months
>now in an attempt to get these various bugs fixed so that different
>equipment interoperates properly.  Most of the vendors have been very
>receptive to making things work now that they know there is a problem.
>Some have already identified solutions while others are still working on
>them.
>
>
>Wesley Irish
>Network Scientist
>Xerox PARC
>[email protected]
>
>[Please send any replies via e-mail as I do not normally read netnews]
>------- End of forwarded message -------



------------------------------------------------------------------------
> 
> Curtis I'm not sure I understand your use of the term "unproven."
> 
> In Lan circles we've been discussing this exact same phenomena for the
> last 9 months (I raised it with Jessica as a potential explanation
> of some of the problems we were seeing in our early testing).
> 
> Bob Metcalfe (coinventor of ethernet) discovered the some ethernet chip
> sets were also violating the inter-packet gap spec. A particular problem
> was that many of the devices used for sniffing themselves had the same
> chip sets and simply couldn't see what was happening to the packet
> stream (silent discards withour errors signalled at the receiving end).
> 
> He needed very expensive signal analysis hardware before
> the cause could be isolated.
> 
> Ken Latta, Merit Network, Inc.
> NSFNET Project, Internet Engineering Group
> 1071 Beal, Ann Arbor, MI 48109-2103
> 313.936.2115 voice,  313.747.3745 fax
> [email protected], [email protected]
> 
> > From:    Curtis Villamizar <[email protected]>
> > To:      [email protected]
> 
> > 
> > FYI-
> > 
> > For those that don't appreciate the consequences of using unproven
> > technology.  The good news on Mae-East is packet loss is down to 15%
> > from 40%?  :-(
> > 
> > Congratulations to Sprint for picking a technology that is known to
> > work for the Sprint NAP.  FDDI works.  We'll see how the others NAPs
> > do, though I'm not encouraged by test results so far.
> > 
> > Curtis
> > 
> > BTW - this is Mae-East (the MFS bridged ethernet), not Mae-East+ (the
> > bridged FDDI).
> > 
> > ------- Forwarded Message
> > 
> > From: Sean Doran <[email protected]>
> > Reply-To: [email protected]
> > To: [email protected]
> > Subject: Moderately urgent: getting rid of annoying packet losses
> > Date: Wed, 19 Oct 1994 02:07:06 -0400
> > Sender: [email protected]
> > 
> > 
> > The Magnum boxes are *very* unhappy with inter-packet gaps of less
> > than about 23 microseconds, and drop back-to-back packets like
> > superheated rocks.
> > 
> > We have a kludge which will help until the MFS hardware gets fixed.
> > 
> > Those of you running one Cisco with EIP 10-0 microcode or better should
> > set the transmitter-delay of your MAE-EAST interface to 96 (0x60).
> > This will dramatically reduce the packet loss across MAE-EAST.
> > 
> > IMPORTANT: Those of you who have more than one box on your ethernet
> > drop to MFS will need to a/ acquire EIP 171-1 from Cisco and load
> > it in then b/ set the transmitter-delay of each of your MAE-EAST
> > interfaces to 0x360 (864).
> > 
> > The new microcode has apparently been well tested, and is doing the
> > right thing for icm-dc-1.icp.net and sl-dc-6.sprintlink.net (drops
> > to most of you have fallen from 40% to much less than 15%).  It
> > works by assigning new meanings to the upper 8 bits of the transmitter-
> > delay value; this particular setting will delay the transfer of
> > the packet to the datalink controller when there is traffic
> > on the wire, then require an additional quiet time of 30usec, 
> > after which there will be the standard 9.6 usec IEEE 802.3 delay.
> > 
> > (The original intent apparently was to avoid drops when bursting
> > ethernet traffic encounters collisions by backing off on handing
> > the packet to the datalink layer; the application here is not quite
> > exactly what was intended, but definitely helps us).
> > 
> > Each of your routers attached to MAE-EAST must run the new EIP 171-6
> > microcode and have the 0x360 transmitter-delay setting.
> > 
> > Thanks to Robert M. Broberg of Cisco for the code.
> > 
> > Those of you without Ciscos will have to come up with a similar hack 
> > somehow.
> > 
> > 	Sean.
> > 
> > P.S.: We are *very* keen on PSI, NETCOM, and MCI to implement the
> >       change, especially PSI.  We aren't having problems with anyone
> >       else we exchange traffic with at MAE-EAST (other than Dante
> >       AS1133, but that's not a Cisco) but everyone would probably 
> >       benefit from the upgrade anyway.  Try pinging each of your peers
> >       in 192.41.177 a hundred or so times.
> > 
> > - - --
> > Sean Doran <[email protected]>  SprintLink/ICM engineering   +1 703 904 2089
> > 
> > ------- End of Forwarded Message
> > 
> 


-- 
--bill
- - - - - - - - - - - - - - - - -