Re: latency (was: RE: cooling door)

  From: Adrian Chadd
  Date: Sun Mar 30 00:56:59 2008

On Sun, Mar 30, 2008, Mikael Abrahamsson wrote:
> On Sat, 29 Mar 2008, Frank Coluccio wrote:
> >Please clarify. To which network element are you referring in connection 
> >with
> >extended lookup times? Is it the collapsed optical backbone switch, or the
> >upstream L3 element, or perhaps both?
> I am talking about the matter that the following topology:
> server - 5 meter UTP - switch - 20 meter fiber - switch - 20 meter 
> fiber - switch - 5 meter UTP - server
> has worse NFS performance than:
> server - 25 meter UTP - switch - 25 meter UTP - server
> Imagine bringing this into metro with 1-2ms delay instead of 0.1-0.5ms.
> This is one of the issues that the server/storage people have to deal 
> with.

Thats because the LAN protocols need to be re-jiggled a little to start
looking less like LAN protocols and more like WAN protocols. Similar
things need to happen for applications.

I helped a friend debug an NFS throughput issue between some Linux servers
running Fortran-77 based numerical analysis code and a 10GE storage backend.
The storage backend can push 10GE without too much trouble but the application
wasn't poking the kernel in the right way (large fetches and prefetching, basically)
to fully utilise the infrastructure.

Oh, and kernel hz tickers can have similar effects on network traffic, if the
application does dumb stuff. If you're (un)lucky then you may see 1 or 2ms
of delay between packet input and scheduling processing. This doesn't matter
so much over 250ms + latent links but matters on 0.1ms - 1ms latent links.

(Can someone please apply some science to this and publish best practices please?)