North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: 923Mbits/s across the ocean

  • From: Iljitsch van Beijnum
  • Date: Mon Mar 10 18:43:10 2003

On Sun, 9 Mar 2003, Richard A Steenbergen wrote:

> On the send size, the application transmitting is guaranteed to utilize
> the buffers immediately (ever seen a huge jump in speed at the beginning
> of a transfer, this is the local buffer being filled, and the application
> has no way to know if this data is going out to the wire, or just to the
> kernel). Then the network must drain the packets onto the wire, sometimes
> very slowly (think about a dialup user downloading from your GigE server).

Actually this is often way too fast as the congestion window doubles
with each ACK. This means that with a large buffer = large window and a
bottleneck somewhere along the way, you are almost guaranteed to have
some serious congestion in the early stages of the session and lower
levels of congestion periodially later on whenever TCP tries to figure
out how large the congestion window can get without losing packets.

This is the part about TCP that I've never understood: why does it send
large numbers of packets back-to-back? This is almost never a good idea.

> On the receive size, the socket buffers must be large enough to
> accommodate all the data received between application read()'s,

That's not true. It's perfectly acceptable for TCP to stall when the
receiving application fails to read the data fast enough. (TCP then
simply announces a window of 0 to the other side so the communication
effectively stops until the application reads some data and a >0 window
is announced.) If not, the kernel would be required to buffer unlimited
amounts of data in the event an application fails to read it from the
buffer for some time (which is a very common situation).

> locally. Jumbo frames help too, but their real benefit is not the
> simplistic "hey look theres 1/3rd the number of frames/sec" view that many
> people see. The good stuff comes from techniques like page flipping, where
> the NIC DMA's data into a memory page which can be flipped through the
> system straight to the application, without copying it throughout. Some
> day TCP may just be implemented on the NIC itself, with ALL work
> offloaded, and the system doing nothing but receiving nice page-sized
> chunks of data at high rates of speed.

Hm, I don't see this happening to a usable degree as TCP has no concept
of records. You really want to use fixed size chunks of information here
rather than pretending everything's a stream.

> IMHO the 1500 byte MTU of ethernet
> will still continue to prevent good end to end performance like this for a
> long time to come. But alas, I digress...

Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND
to support a per-neighbor MTU? This should make backward-compatible
adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while
we're at it.)

Iljitsch van Beijnum