North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Thoughts on increasing MTUs on the internet

  • From: Douglas Otis
  • Date: Sat Apr 14 23:52:14 2007


On Apr 14, 2007, at 1:10 PM, Iljitsch van Beijnum wrote:
On 14-apr-2007, at 19:22, Douglas Otis wrote:

1500 byte MTUs in fact work. I'm all for 9K MTUs, and would recommend them. I don't see the point of 65K MTUs.

Keep in mind that a 9KB MTU still reduces the Ethernet CRC effectiveness by a fair amount.

I can't find bit error rate specs for various types of ethernet real quick, but if you assume 10^-9 that means that ~ 1 in 10000 11454 byte packets has one bit error, so around 1 in 10^12 has four bit errors and has a _chance_ to defeat the CRC32. The naieve assumption that only 1 in 2^32 of those packets with 3 flipped bits will have a valid CRC32 is probably incorrect, but the CRC should still catch most of those packetss for a fairly large value of "most".

http://www.ietf.org/rfc/rfc3385.txt http://citeseer.ist.psu.edu/koopman02bit.html


For 1500 byte packets the fraction of packets with three bits flipped would be around 1 : 10^15, correcting for the larger number of packets per given amount of data, that's a difference of about 1 : 100.


Quoting from "When The CRC and TCP Checksum Disagree" by Jonathan Stone and Craig Partridge:


http://citeseer.ist.psu.edu/cache/papers/cs/21401/ http:zSzzSzsigcomm.it.uu.sezSzconfzSzpaperzSzsigcomm2000-9-1.pdf/ stone00when.pdf

"Traces of Internet packets from the past two years show that between 1 packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on links where link-level CRCs should catch all but 1 in 4 billion errors. For certain situations, the rate of checksum failures can be even higher: in one hour-long test we observed a checksum failure of 1 packet in 400. We investigate why so many errors are observed, when link-level CRCs should catch nearly all of them.

We have collected nearly 500,000 packets which failed the TCP or UDP or IP checksum. This dataset shows the Internet has a wide variety of error sources which can not be detected by link-level checks. We describe analysis tools that have identified nearly 100 different error patterns. Categorizing packet errors, we can infer likely causes which explain roughly half the observed errors. The causes span the entire spectrum of a network stack, from memory errors to bugs in TCP.

After an analysis we conclude that the checksum will fail to detect errors for roughly 1 in 16 million to 10 billion packets. From our analysis of the cause of errors, we propose simple changes to several protocols which will decrease the rate of undetected error. Even so, the highly non-random distribution of errors strongly suggests some applications should employ application-level checksums or equivalents."

Hardware weaknesses within DSLAMs or various memory arrays, such as a weak driver on some internal interface, can generate high levels of multi-bit errors not detected by TCP checksums. When affecting the same bit within an interface, more than 1 out of 100 may go undetected.


That seems like a lot, but getting better quality fiber easily compensates for this. Expressed differently, the average amount of data transmitted where you see one packet with three flipped bits is around 10 petabytes for 11454 byte packets and some 1.3 exabytes for 1500 byte packets. For the large packets that would be one packet in three years at 1 Gbps, for the small ones one packet in 380 years.

Consider that the CRC is not always carried with the packet between interfaces.


-Doug