North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: TCP session disconnection caused by Code Red?

  • From: Alex Bligh
  • Date: Mon Aug 06 15:52:54 2001


I've been told (but not given permission to forward details of
who/how/what) that some major sites with a single router
and relatively flat network topology are dying due to the ARP
request flood that is being generated by Code Red scans on the
inside of their border router choking the router.  Check the
rate of ARP requests coming off your border router and see if
it seems excessive; if so, that may be it.
2 points:

1. RFC826 appears to mandate only positive ARP caching. I can't
  see a reason why negative ARP caching shouldn't work this
  way:

  Keep only one ARP request in flight at a time. Retry ARPs
  a maximum of [5] times, separated by at least [1] second.
  After that, cache non-existance of a h/w address for that
  IP address for normal positive caching time. If you see any
  IP traffic inbound on that interface with that IP address,
  remove the negative cache. However, to get a positive cache
  entry you still need a valid ARP response (promiscuous or not).

  More formally, when address resolution is required:

  a) Look up IP address in ARP table
     i)   If entry is PRESENT (i.e. h/w address OK)
          return this value.
     ii)  If entry is NEXIST return ARP failure
          immediately (i.e. as a router, drop into
          the code where no route is found - on Cisco
          this would be rate-limited unreachables)
     iii) If entry is INCOMPLETE[\d] go to (b) performing
          further packet transmission (i.e. transmitting
          an ARP packet ONLY if the entry is fully aged
          (i.e. otherwise perform
          your RFC826 compatible / current operation
          without transmitting another ARP packet)
     iv)  If entry is absent, transmit ARP packet
          as normal, set entry to INCOMPLETE[0] and go to (b)
  b) [this is the action we perform if we don't yet
     know the h/w address]. RFC826 suggests returning
     allowing a higher layer to retransmit, though I
     suppose blocking is theoretically possible

  If a valid ARP response is received (promiscuous or
  otherwise), remove any existing entry, and generate
  a PRESENT entry.

  If /any/ packet is received from with a valid IP
  address remove an NEXIST entry if present (on the
  ARP table for the interface on which it was received only)
  [this check is arguably too thorough as it will remove
  valid NEXIST entries for IP addresses that exist, but behind
  a router on the current subnet, rather than on it directly,
  though this is (a) better than nothing, and (b) required
  to support proxy ARP properly; note that you can't rely
  on the MAC address being that of the IP though - still have
  to ARP]

  Age INCOMPLETE[n] states to INCOMPLETE[n+1] states after
  [t1] seconds (probably about 1 second), for n<N, and to
  NEXIST for n>=N (N is probably about 5)

  Age NEXIST state to deleted after about [t2] seconds (where
  t2 is probably close to the arp timeout - i.e. about 300)

  INCOMPLETE essentially means PENDING

2. It has been observed that Cisco products in particular do not
  handle ARP storms well. Even worse is the Catalyst 5[50]00. This
  may have been fixed since I saw it. The application in which I
  saw it seriously merited having a linux box or similar 'proxy'-arp
  all non-existant addresses to null. You can probably achieve the
  same result with static arp entries to a non-existant h/w address.

Alex Bligh
Personal Capacity