North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: mysterious packet delay to/from www.caida.org was: Cisco Netflow Analysis Software

  • From: Kai Schlichting
  • Date: Fri Mar 17 14:01:09 2000

Just a follow-up how this was solved, with some actual operational
relevance :)

Apparently, a fairly large number of resolvers (old BIND4's?) have a
significant problem with being told about delegations to nameservers that
are NOT registered and hence have no A records about them in the root 
nameservers.

A stern warning to network operators' DNS groups:
*do not delegate in-addr.arpa zones to unregistered nameservers or reverse
  resolution will break for many resolvers trying to resolve your client's IPs*
This changes the practice from "SHOULD NOT" to "MUST NOT" for the
time being :(

(the resolution was to have the network operator change the delegation
back to the registered names, until NSI will finally change the host
registrations, said operator was helpful with that also)

The actual delay (at www.caida.org and other servers) is apparently a blocking
call to resolver code in Apache in the middle of serving the page : the
page comes up to a certain point, it gets stuck with reverse DNS lookup
which eventually times out, then the rest of the page gets served.
(Hey CAIDA, turn off your DNS logging on your webserver !) Looks like a
real network transport or webserver performance problem, but isn't.

Actual chain of events: local machine connects to remote http/smtp/etc.
server, remote server tries to resolve PTR record via a PTR query to
ROOT-NS's. ROOT-NS's show delegation to ISP/NSP's nameservers.
ISP/NSP then delegates to unregistered NS's further down in the chain
(end-user, ISP customer).
If the machine trying to resolve the PTR record does NOT know the A
record for these delegatees (by having it in its cache for example),
it will make no effort to recursively resolve such an A record (the
only servers asked for the A record are apparently the ROOT-NS's) :
A RR's may exist for the servers it has to ask for the ultimate PTR
record, but it makes no attempt of an A RR query, and subsequently no
final query for a PTR record either.
What it does in the 20+ seconds until it times out is a damn good
guess: depending on how many nameservers the remote http/smtp server
has in it's resolv.conf file, this process will repeat a few times
(20-80s delay!).

Given that delegation within end-user organizations rarely happens
anymore (the Internet is not comprised of /16 .edu's anymore, but
/24 .com's who more often than not let their provider do the DNS
for them), this bug must have been discovered a long time after
it was introduced (and my speculation about older BIND4 code
is just that - speculation).

Thanks for the bandwidth,
bye,Kai

--
[email protected]             "Just say No" to Spam            Kai Schlichting
Palo Alto, New York, You name it             Sophisticated Technical Peon
Kai's SpamShield <tm> is FREE!                 http://SpamShield.Conti.nu
|                                                                       |
LeasedLines-FrameRelay-IPLs-ISDN-PPP-Cisco-Consulting-VoiceFax-Data-Muxes
WorldWideWebAnything-Intranets-NetAdmin-UnixAdmin-Security-ReallyHardMath