North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: 923Mbits/s across the ocean

  • From: Iljitsch van Beijnum
  • Date: Tue Mar 11 05:36:28 2003

On Mon, 10 Mar 2003, Richard A Steenbergen wrote:

> > > On the receive size, the socket buffers must be large enough to
> > > accommodate all the data received between application read()'s,

> > That's not true. It's perfectly acceptable for TCP to stall when the
> > receiving application fails to read the data fast enough.

> Ok, I think I was unclear. You don't NEED to have buffers large enough to
> accommodate all that data received between application read()'s, unless
> you are trying to achieve maximum performance. I thought that was the
> general framework we were all working under. :)

You got me there.  :-)

It seemed that you were talking about more general requirements at this
point, though with the upper and lower limits for kernel buffer space
and all.

> > Hm, I don't see this happening to a usable degree as TCP has no concept
> > of records. You really want to use fixed size chunks of information here
> > rather than pretending everything's a stream.

> We're talking optimizations for high performance transfers... It can't
> always be a stream.

Right. But TCP is a stream protocol. This has many advantages, nearly
all of which are irrelevant for high volume high bandwidth bulk data
transfer.

I can imagine a system that only works in one direction and where the
data is split into fixed size records (which would ideally fit into a
single packet) where each record is acknowledged independently (but
certainly not for each individual packet). I would also want to take
advantage of traffic classification mechanisms: first the data is
flooded at the maximum speed at the lowest possible traffic class.
Everything that doesn't make it to the other end is then resent slower
with a higher traffic class. If the network supports priority queuing
then this would effectively sponge up all free bandwidth without
impacting regular interactive traffic. If after a few retries some data
still didn't make it: simply skip this for now (but keep a record of the
missing bits) and keep going. Many applications can live with some lost
data and for others it's probably more efficient to keep running at high
speed and repair the gaps afterwards.

> > > IMHO the 1500 byte MTU of ethernet
> > > will still continue to prevent good end to end performance like this for a
> > > long time to come. But alas, I digress...

> > Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND
> > to support a per-neighbor MTU? This should make backward-compatible
> > adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while
> > we're at it.)

> Not necessarily sure thats the right thing to do, but SOMETHIG has got to
> be better than what passes for path mtu discovery now. :)

We can't replace path MTU discovery (but hopefully people will start to
realize ICMP messages were invented for another reason than job security
for firewalls). But what we need is a way for 10/100 Mbps 1500 byte
hosts to live with 1000 Mbps 9000 byte hosts on the same subnet. I
thought IPv6 neighbor discovery supported this because ND can
communicate the MTU between hosts on the same subnet, but unfortunately
this is a subnet-wide MTU and not a per-host MTU, which is what we
really need.

Iljitsch