North American Network Operators Group Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical Re: FYI - unproven technology
Ken, I don't think it was Bob Metcalfe... Note this telling extract from USENET of just over a year back. >------- Start of forwarded message ------- >Newsgroups: comp.dcom.lans.ethernet >Path: cronkite.cisco.com!decwrl!parc!wirish >From: [email protected] (Wes Irish) >Subject: Performance problems on high utilization Ethernets >Message-ID: <[email protected]> >Summary: High utilization Ethernet performance problems traced to controller implementation bugs >Keywords: Ethernet, communications, interframe gap, IFG, collisions, controller, interface, packet loss, data link >Sender: [email protected] >Organization: Xerox PARC >Date: 16 Oct 93 00:36:23 GMT >Lines: 115 > >For the past year or so I have been investigating performance problems >on the Ethernets here at PARC. This work has uncovered problems with a >number of Ethernet controllers in common use today. These low-level >controller problems can lead to serious performance problems for many of >the systems involved. A full paper on this work, "Investigations into >observed performance problems on high utilization Ethernet networks", >will be released soon (initially as a PARC Blue & White report). But, >since I have been giving talks on this work and news of it has begun to >hit the Internet, I feel that a should post a preliminary report in >order to reduce speculation and to make sure that the facts are >correctly stated. Below is a short summary of some of the key facts and >issues. > >The Ethernet specifications talk about making sure that transmitters >enforce a 9.6 microsecond gap (IFG) between frames (packets). This is >straight forward in the case of a gap following a just completed good >packet. But, gaps following collision events are less straight forward. >I do not want to debate the details of what is and is not "correct" in >this case -- that is a discussion for another time and place. The >reality of the situation is that there are a number of controllers in >wide-spread use on networks today that do not interoperate very well in >the face of collisions. > >In general, the problems arise when the gap following a collision is too >short for a particular implementation of a receiver. In addition to >uncovering controllers that simply generate short IFGs I have also >uncovered a major implementation bug in a particular chip that injects >short signal bursts onto the network. These bursts can damage the IFG >"enforced" by other machines. Either way, the result is that same -- a >short IFG preceding a packet which can result in a missed packet. > >It is important to note that when a controller misses a packet due to a >short IFG THE FACT THAT THE PACKET WAS MISSED IS NOT DETECTED NOR >REPORTED TO THE SYSTEM. System and driver statistics will claim no >packets lost (unless some are lost for other reasons). Even most >network analyzers are subject to the same undetected and therefore >unreported packet loss. I have resorted to using a digital oscilloscope >to capture and analyze these events. > >Let me emphasize that these problems are almost exclusively related to >dealing with collision events. On a lightly loaded network, where >collisions are few and far between, these problems are virtually >non-existent. But these problems do indeed come into play on moderate >to heavily loaded networks. Based on my observations a VERY ROUGH >network load dividing line is about 25% load (using 0.1 or 1.0 sec >samples). > >Here is an enumeration of some of the facts related to particular >controllers that I have uncovered so far. There may be problems with >other controllers but they may not appear on the networks that I have >inspected. > >Controller: Intel 82586 >Commonly found in: SUN 3's and SUN 4's (ie interfaces), many other >machines >Problem: Can generate a short IFG following a collision >Cause: starts IFG timer on CS dropout > >Controller: Intel 82596 >Commonly found in: Network General Sniffer using Cogent interface card >Problem: Will not hear packet unless preceding IFG is 4.6 usec or larger > >Controller: SEEQ 8003 >Commonly found in: Cisco MEC and MCI interfaces, older SGI (Silicon >Graphics) including 4D/35 and Indigo (but not Indigo2) >Problem: Can generate a short IFG following a collision >Cause: Starts 9.6 usec timer at end of its on jam and not end of >collision >Problem: Generates 24 bit signal burst onto network following some >collisions. This burst lands in the IFG following the collision and >will often result in two short IFGs resulting in other controllers >missing the packet. NB: this can happen even if the chip has nothing to >transmit! > >Controller: AMD 7990 "LANCE" >Commonly found in: SUN SPARCStation machines (SS-1, SS-1+, SS-2, SS-10, >...), many DEC machines, Cisco/SynOptics routers, Cisco IGS, many other >machines >Problem: Will not hear packet unless preceding IFG is 4.1 usec or larger >Cause: implementation state machine >Problem: many other problems including lock-up, transmit gaps greater >than 9.6 usec under load, etc. >Fix: A new version of the controller, the 79C90 CLANCE, fixes many of >these problems but is not in common use like the LANCE. > >Interface chip: AT&T T7213 >Commonly found in: SUN SPARCStation 10 and other newer SUN machines >Problem: Will hold the collision (and kill data) sent to the controller >chip across IFGs of roughly 1.0 usec or less. It will also do this if a >"manchester coding violation" is detected in a packet -- a job that >should be left to the controller. > > >The result of all of these implementation details is that it is very >possible, even probable, to put together a network that results in >"undetected" packet loss. Packet loss rates of even less than 1% can >result in performance hits as high as 80%, depending on a multitude of >factors including the protocols and implementations being used. I have >clocked the potential packet drop rate at PARC due to these problems to >be in the 1% - 5% range at times. > >I have been working with many of these vendors for a number of months >now in an attempt to get these various bugs fixed so that different >equipment interoperates properly. Most of the vendors have been very >receptive to making things work now that they know there is a problem. >Some have already identified solutions while others are still working on >them. > > >Wesley Irish >Network Scientist >Xerox PARC >[email protected] > >[Please send any replies via e-mail as I do not normally read netnews] >------- End of forwarded message ------- ------------------------------------------------------------------------ > > Curtis I'm not sure I understand your use of the term "unproven." > > In Lan circles we've been discussing this exact same phenomena for the > last 9 months (I raised it with Jessica as a potential explanation > of some of the problems we were seeing in our early testing). > > Bob Metcalfe (coinventor of ethernet) discovered the some ethernet chip > sets were also violating the inter-packet gap spec. A particular problem > was that many of the devices used for sniffing themselves had the same > chip sets and simply couldn't see what was happening to the packet > stream (silent discards withour errors signalled at the receiving end). > > He needed very expensive signal analysis hardware before > the cause could be isolated. > > Ken Latta, Merit Network, Inc. > NSFNET Project, Internet Engineering Group > 1071 Beal, Ann Arbor, MI 48109-2103 > 313.936.2115 voice, 313.747.3745 fax > [email protected], [email protected] > > > From: Curtis Villamizar <[email protected]> > > To: [email protected] > > > > > FYI- > > > > For those that don't appreciate the consequences of using unproven > > technology. The good news on Mae-East is packet loss is down to 15% > > from 40%? :-( > > > > Congratulations to Sprint for picking a technology that is known to > > work for the Sprint NAP. FDDI works. We'll see how the others NAPs > > do, though I'm not encouraged by test results so far. > > > > Curtis > > > > BTW - this is Mae-East (the MFS bridged ethernet), not Mae-East+ (the > > bridged FDDI). > > > > ------- Forwarded Message > > > > From: Sean Doran <[email protected]> > > Reply-To: [email protected] > > To: [email protected] > > Subject: Moderately urgent: getting rid of annoying packet losses > > Date: Wed, 19 Oct 1994 02:07:06 -0400 > > Sender: [email protected] > > > > > > The Magnum boxes are *very* unhappy with inter-packet gaps of less > > than about 23 microseconds, and drop back-to-back packets like > > superheated rocks. > > > > We have a kludge which will help until the MFS hardware gets fixed. > > > > Those of you running one Cisco with EIP 10-0 microcode or better should > > set the transmitter-delay of your MAE-EAST interface to 96 (0x60). > > This will dramatically reduce the packet loss across MAE-EAST. > > > > IMPORTANT: Those of you who have more than one box on your ethernet > > drop to MFS will need to a/ acquire EIP 171-1 from Cisco and load > > it in then b/ set the transmitter-delay of each of your MAE-EAST > > interfaces to 0x360 (864). > > > > The new microcode has apparently been well tested, and is doing the > > right thing for icm-dc-1.icp.net and sl-dc-6.sprintlink.net (drops > > to most of you have fallen from 40% to much less than 15%). It > > works by assigning new meanings to the upper 8 bits of the transmitter- > > delay value; this particular setting will delay the transfer of > > the packet to the datalink controller when there is traffic > > on the wire, then require an additional quiet time of 30usec, > > after which there will be the standard 9.6 usec IEEE 802.3 delay. > > > > (The original intent apparently was to avoid drops when bursting > > ethernet traffic encounters collisions by backing off on handing > > the packet to the datalink layer; the application here is not quite > > exactly what was intended, but definitely helps us). > > > > Each of your routers attached to MAE-EAST must run the new EIP 171-6 > > microcode and have the 0x360 transmitter-delay setting. > > > > Thanks to Robert M. Broberg of Cisco for the code. > > > > Those of you without Ciscos will have to come up with a similar hack > > somehow. > > > > Sean. > > > > P.S.: We are *very* keen on PSI, NETCOM, and MCI to implement the > > change, especially PSI. We aren't having problems with anyone > > else we exchange traffic with at MAE-EAST (other than Dante > > AS1133, but that's not a Cisco) but everyone would probably > > benefit from the upgrade anyway. Try pinging each of your peers > > in 192.41.177 a hundred or so times. > > > > - - -- > > Sean Doran <[email protected]> SprintLink/ICM engineering +1 703 904 2089 > > > > ------- End of Forwarded Message > > > -- --bill - - - - - - - - - - - - - - - - -
|