North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: MicroSoft amplification?

  • From: Tony Rall
  • Date: Thu Aug 02 02:27:49 2001

On Thursday, 2001/08/02 at 00:49 AST, [email protected] wrote:
> On Wed, 01 Aug 2001 16:26:55 PDT, Tony Rall <[email protected]> 
said:
> > echo).  This probably makes PMTUD work a lot better, but it sucks for 
ICMP
> 
> Or totally horques it up entirely if the actual data path used has a
> different PMTU.  No way this will work if 9 paths are clean and one
> requires a frag. ;)
> 
> I won't discuss what to do if you get back 10 FRAG NEEDED packets, with
> differing frag sizes ;)

That's not the way it works.  You've got a load-balancing-system (LBS) 
front-ending (using a single IP address) a cluster 10 web servers.  A 
client on the other side of the LBS initiates an 80/tcp connection.   The 
LBS directs it to (let's say) server 6.  Once data starts flowing on the 
connection, nothing interesting happens until server 6 sends a large 
packet to the client with (as on all of its packets) the Don't-frag flag 
on.  That packet reaches a link with a smaller mtu.  The router on that 
link returns (to the server complex) and ICMP unreachable, fragmentation 
needed packet (type 3, code 4).

That ICMP reaches the LBS; it has to decide what to do with it.  Some LBSs 
will just discard any ICMP packets addressed to the cluster.  The one used 
by MS instead forwards it to all the back-end servers.  The servers that 
don't have a session with the client may just discard the ICMP packet (or 
they may simply update info in their routing table (I know that Aix does 
that)).  The server that does have a session with the client will 
repackage his data packet (per the newly learned mtu) and send the smaller 
packet.  The path between the chosen server to the client is no more 
ambiguous than any other PMTUD situation.  Which is to say that, yes, the 
path could change from packet to packet, but that isn't brought on by the 
presence of the LBS, it's just a shortcoming of the PMTUD mechanism.  In 
fact outbound traffic from the clustered servers often doesn't even go 
through the LBS.

(Note that if the client is also using PMTUD and happens to send a large 
enough packet to trigger it, the only ICMP unreachable sent would be 
towards the client.  Even if the link mtu causing the unreachable is on 
the server side of the LBS, there will be only one unreachable sent - no 
ambiguity at all.)

(Also note that it isn't necessary for an LBS to forward all ICMPs to make 
PMTUD work.  It just has to forward the unreachable, fragmentation needed 
packets.  And it doesn't have to forward those to all the back-end 
servers.  There is enough info in the unreachable message to determine 
which connection this ICMP message relates to - the 80/tcp connection 
between the client and server 6.  So the LBS could know that this ICMP 
only has to be forwarded to server 6.  I don't know of any LBS that is 
smart enough to handle it this way.)

Tony Rall