North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
PMTU-D: remember, your load balancer is broken
This is your monthly PMTU-D horkage rant. Chances are that if you are using a load balancer for TCP connections, then it does not properly handle Path MTU Discovery. Examples of devices like the ones I am talking about that do not, last I knew, handle this properly are localdirectors and arrowpoints. F5 claimed that they fixed their big/ip product to do this properly some time ago (remember when they broke NSI's whois service this way?), but I haven't seen it in action yet or know what version is required, and their support channels don't seem to know much about it when asked and give nonsensical answers like "it is built into the BSD/OS system that our product is built on". I would love to know about any such load balanceres that actually do handle this right. For an explanation of PMTU-D, see http://users.worldgate.ca/~marcs/mtu/ What happens with most load balancers is that when the server behind them tries to use PMTU-D, the ICMP "can't fragment" that may come back from a router between the server and the client will not make it to the load-balanced server because the load balancer will throw it away. The result is that most users with a path MTU that is less than min(client MTU, server MTU) will be unable to receive data from the server. The fix is to bitch at your vendor to fix their broken system and to tell them to hire someone that knows something about how TCP works. If you are a vendor, then make sure your load balancing software works right. What it needs to do is either send the "can't fragment" on to just the backend servers that have connections from the remote IP, or to flood it to all of them. The workaround for the person using such load balancers is to disable PMTU-D on your backend servers. This is your only option if the vendor of your load balancer doesn't care or takes a while to release a fix. If you have complaints that small subset of clients that can open a TCP connection to your load balanced IP but can't receive any reponse to their request, this could be what is up. (yes, www.slashdot.org seems to be broken in this way as I type. Oh well, slashdot isn't always a good thing...) This has been your monthly PMTU-D horkage rant.