North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: DNS Timeout Errors

  • From: Simon Leinen
  • Date: Thu Dec 09 17:35:12 2004

Jay,

> Is anyone else experiencing DNS timeout errors.  I've tried using
> multiple name resolvers, and tested multiple domain names using
> different name servers, and I keep getting "name not found" errors.

> Trying the same domain name a second time, and it resolves ok.  This
> all started a few days ago.

About three weeks ago, some of our users have told us that they were
experiencing many DNS resolution failures while surfing the Web.  We
analyzed this, and part of the explanation we came up with should work
for others, especially if the following conditions are met:

Are you using BIND 9 on the recursive nameserver that you normally use?
If so, does the installation of BIND 9 on your recursive nameserver
include support for DNS queries over IPv6?

BIND 9 seems to have trouble when a nameserver responds fine under
IPv4, but doesn't respond well (or at all) under IPv6 (e.g. because
IPv6 connectivity between you and the server is somehow broken): It
will continue to query the name server under its unresponsive IPv6
address in some situations.  I have seen this a lot when tracing IPv6
DNS queries from our recursive name servers(*).

This can be very noticeable, especially since A.GTLD-SERVERS.NET and
B.GTLD-SERVERS.NET now have AAAA records (IPv6 addresses).  Many
ccTLDs - including ours - have recently added IPv6-reachable name
servers, too.

I'm wondering whether many users are seeing this, but I have no idea
how to gather data on this, especially historical data.  (Except maybe
trying to correlate access times from server logs of popular Web
servers that refer to each other.)

I'm attaching a message from comp.protocols.dns.bind that refers to
this problem.
-- 
Simon.

(*) In our case, our recursive name server was using the wrong source
    address for its queries, namely its anycast IPv6 address (Linux
    IPv6 source address selection sucks!), so it would often not
    receive a response to a query over IPv6, because the response
    would end up at another anycast instance.

    But I assume the more common case is that the IPv6 queries don't
    reach the authoritative name server at all, because the recursive
    name server doesn't have global IPv6 connectivity.  The IPv6
    connectivity problem may also be at the end of an important
    authoritative server, and still cause problems.

--- Begin Message ---
  • Newsgroups: comp.protocols.dns.bind
  • Xref: ezmp2.switch.ch comp.protocols.dns.bind:13233
> Hello List --
> 
> I tried searching for this in the archives and didn't see anything
> conclusive.
> 
> We are an ISP with caching resolvers running BIND9.2.2 on Solaris 8 that
> are not behind firewalls.  Upon running scripts to test unrelated issues,
> I noticed that any time I queried any of my resolvers for domains that
> have not been cached, the recursive query response times are horrible --
> consistently over 4 seconds.  If I clear the cache and run a script that
> digs over 100 random domains, all of them come back > 4 seconds.  Nothing
> has changed on our resolvers' config in months.  Root hint file is up to
> date.  Dig +trace or debug isn't showing anything. Tcpdump/snoop shows
> nothing, other than an empty hole when the machine is waiting for a
> response back from any root server.  Queries against the boxes locally vs.
> queries from another machine make no difference.  We have tried boxes that
> have not been patched in months as well as up-to date machines.  All the
> same.
> 
> Here's the options we have:
> 
> 
> options {
> 
>         directory "/var/named";
> /*
> *
> */
>         max-ncache-ttl 10800;
>         transfers-in 25;
>         notify no;
>         allow-query { CSR; DEV; localhost; };
>         recursion yes;
>         recursive-clients 100000;
>         allow-transfer { none; };
>         interface-interval 0;
>         cleaning-interval 30;
>         blackhole { 10.0.0.0/8; 192.168.0.0/16; };
>         pid-file "named.pid";
> 
> };
> 
> 
> Although I would be happy to post more info for your review, my questions
> are these:  Has anyone else noticed this lag in recursion recently?  Can
> anyone on this list try clearing their cache and then running queries for
> random domains and noting the response time?
> 
> Curiously, an old BIND8 box we have does NOT experience this lag, no
> matter what.
> 
> Any insight you may have is appreciated.
> 
> Thanks
> 
> -Erik J
 

        Know issue which will be fixed in BIND 9.2.5/9.3.1.

        Workarounds:
        * upgrade to 9.3.0 and run "named -4".
        * configure --disable-ipv6.
        * get yourself IPv6 connectivity.

	A.GTLD-SERVERS.NET and B.GTLD-SERVERS.NET now have AAAA address
	and the RTT estimates are not being penalised because you don't
	have IPv6 connectivity.

        Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: [email protected]


--- End Message ---