North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
Re: TCP session disconnection caused by Code Red?
I've been told (but not given permission to forward details of who/how/what) that some major sites with a single router and relatively flat network topology are dying due to the ARP request flood that is being generated by Code Red scans on the inside of their border router choking the router. Check the rate of ARP requests coming off your border router and see if it seems excessive; if so, that may be it.
2 points: 1. RFC826 appears to mandate only positive ARP caching. I can't see a reason why negative ARP caching shouldn't work this way: Keep only one ARP request in flight at a time. Retry ARPs a maximum of  times, separated by at least  second. After that, cache non-existance of a h/w address for that IP address for normal positive caching time. If you see any IP traffic inbound on that interface with that IP address, remove the negative cache. However, to get a positive cache entry you still need a valid ARP response (promiscuous or not). More formally, when address resolution is required: a) Look up IP address in ARP table i) If entry is PRESENT (i.e. h/w address OK) return this value. ii) If entry is NEXIST return ARP failure immediately (i.e. as a router, drop into the code where no route is found - on Cisco this would be rate-limited unreachables) iii) If entry is INCOMPLETE[\d] go to (b) performing further packet transmission (i.e. transmitting an ARP packet ONLY if the entry is fully aged (i.e. otherwise perform your RFC826 compatible / current operation without transmitting another ARP packet) iv) If entry is absent, transmit ARP packet as normal, set entry to INCOMPLETE and go to (b) b) [this is the action we perform if we don't yet know the h/w address]. RFC826 suggests returning allowing a higher layer to retransmit, though I suppose blocking is theoretically possible If a valid ARP response is received (promiscuous or otherwise), remove any existing entry, and generate a PRESENT entry. If /any/ packet is received from with a valid IP address remove an NEXIST entry if present (on the ARP table for the interface on which it was received only) [this check is arguably too thorough as it will remove valid NEXIST entries for IP addresses that exist, but behind a router on the current subnet, rather than on it directly, though this is (a) better than nothing, and (b) required to support proxy ARP properly; note that you can't rely on the MAC address being that of the IP though - still have to ARP] Age INCOMPLETE[n] states to INCOMPLETE[n+1] states after [t1] seconds (probably about 1 second), for n<N, and to NEXIST for n>=N (N is probably about 5) Age NEXIST state to deleted after about [t2] seconds (where t2 is probably close to the arp timeout - i.e. about 300) INCOMPLETE essentially means PENDING 2. It has been observed that Cisco products in particular do not handle ARP storms well. Even worse is the Catalyst 500. This may have been fixed since I saw it. The application in which I saw it seriously merited having a linux box or similar 'proxy'-arp all non-existant addresses to null. You can probably achieve the same result with static arp entries to a non-existant h/w address. Alex Bligh Personal Capacity