North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

RE: packet reordering at exchange points

  • From: Kavi, Prabhu
  • Date: Tue Apr 09 14:01:46 2002

An interesting historical observation,

Many years ago when I used to create discrete event simulation 
network models for a living, I had one project which was to model 
(what was then) a widely implemented PC TCP stack.  I remember that 
one wart of this implementation was that when packet reordering 
occurred it collapsed the window size to 1!  

Anyone know if strange warts like this still exist in desktop 
systems?

Prabhu
----------------------------------------------------------------------
Prabhu Kavi                     Phone:  1-978-264-4900 x125 
Director, Adv. Prod. Planning   Fax:    1-978-264-0671
Tenor Networks                  Email:  [email protected]
100 Nagog Park                  WWW:    www.tenornetworks.com
Acton, MA 01720


> -----Original Message-----
> From: Iljitsch van Beijnum [mailto:[email protected]]
> Sent: Tuesday, April 09, 2002 12:36 PM
> To: Stephen Sprunk
> Cc: [email protected]
> Subject: Re: packet reordering at exchange points
> 
> 
> 
> On Mon, 8 Apr 2002, Stephen Sprunk wrote:
> 
> > Thus spake "Iljitsch van Beijnum" <[email protected]>
> > > But how is packet reordering on two parallell gigabit interfaces
> > > ever going to translate into reordered packets for individual
> > > streams?
> 
> > Think of a large FTP between two well-connected machines.  
> Such flows tend
> > to generate periodic clumps of packets; split one of these 
> clumps across two
> > pipes and the clump will arrive out of order at the other end.  The
> > resulting mess will create a clump of retransmissions, then 
> another bigger
> > clump of new data, ...
> 
> I don't think it will be this bad, even if hosts are 
> connected at GigE and
> the trunk is 2 x GigE. In this case, a (delayed) ACK will usually
> acknowledge 2 segments so it will trigger transmission of two new
> segments. These will arrive back to back at the router/switch 
> doing the
> load balancing. Since there is obviously need for more than 1 
> Gbit worth
> of bandwidth, it is likely the average queue size is at least 
> close to 1
> (= ~65% line use) or even higher. If this is the case, there 
> is a _chance_
> the second packet gains a full packet time over the first and arrives
> first at the destination.  However, this is NOT especially 
> likely if both
> packets are the same size:  the _average_ queue sizes will be 
> the same so
> in half the cases the first packet gains an even bigger 
> advance over the
> second, and only in a fraction of half the cases the second 
> packet gains
> enough over the first to pass it. And then, the destination host still
> only sees a single packet coming in out of order, which isn't 
> enough to
> trigger fast retransmit.
> 
> You need to load balance over more than two connections to trigger
> unnecessary fast retransmit (over two lines, packet #3 isn't 
> going to pass
> by packet #1), AND you need to send more than two packets 
> back to back.
> Also, you need to be at the same speed as the load balanced lines,
> otherwise your packet train gets split up by traffic from 
> other interfaces
> or idle time on the line.
> 
> And _then_, if all of this happens, all the retransmitted 
> data is left of
> window. I'm not even sure if those packets generate an ACK, 
> and if they
> do, if the sender takes any action on this ACK. If this 
> triggers another
> round of fast retransmit, the FR implementation should be considered
> broken, IMO.
> 
> > > Packets for streams that are subject to header compression or
> > > for voice over IP or even Mbone are nearly always transmitted
> > > at relatively large intervals, so they can't travel down parallell
> > > paths simultaneously.
> 
> > RTP reordering isn't a problem in my experience, probably 
> since RTP has an
> > inherent resequencing mechanism.
> 
> My point is real time protocols will not see reordering 
> unless they are
> using up nearly the full line speed or there is congestion, 
> because these
> protocols don't send out packets back to back like TCP 
> sometimes does. How
> big are VoIP packets? Even with an 80 byte payload you get 
> 100 packets per
> second = 10 ms between packets, which is more than 80 packet times for
> GigE = congestion. And if there is congestion, all 
> performance bets are
> off.
> 
> It seems to me spending (CPU) time and money to do more complex load
> balancing than per packet round robing in order to avoid 
> reordering only
> helps some people with GigE connected hosts some of the time. 
> Using this
> time or money to overcome congestion is probably a better investment.
> 
> PS. For everyone looking at their netstat -p tcp output: 
> packet loss also
>     counts towards the out of order packets, it is hard to 
> get the real
>     out of order figures.
> 
> PS2. Isn't it annoying to have to think about layer 4 to 
> build layer 2 stuff?
> 
>