North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: DNS TTL adherence

  • From: Simon Waters
  • Date: Thu Mar 16 04:14:28 2006

On Thursday 16 Mar 2006 04:23, you wrote:
> 
> You might consider the following paper from IMC 2003: "On the
> Responsiveness of DNS-based Network Control" by Jeffrey Pang, Aditya
> Akella, Anees Shaikh, Balachander Krishnamurthy, Srinivasan Seshan,
> http://www.imconf.net/imc-2004/papers/p21-pang.pdf

The results are greatly at odds with my experience.

As they imply the problem may be specifically misconfigured ISPs DNS server, 
which might explain why we see less violations, if our sites aren't popular 
with those ISPs users.

However I wouldn't trust any report where the control of the authoritative DNS 
itself wasn't explicitly monitored and reported. They may think they have 
updated the authoritative answers (and TTL), but in my experience when you 
find violators you often find that the authoritative DNS servers didn't all 
update as, or when, expected, or that earlier records were returned with a 
longer TTL from those servers.

Certainly that was the experience of moving many sites last week. Where you 
can in real time check the logs and find which domains we messed up on by the 
traffic still arriving.

Looking at the 4 long term violators for one site....

  Hits  Source IP
      8 198.78.130.68 <--- ??
      1 212.95.252.16 <--- lager.netcraft.com
     15 66.147.154.3  <--- IBM Almaden Research Center
      5 70.42.51.10     <--- Fast Search & Transfer

During this period (starting 3 days after moving a 10 minute TTL) we saw 27234 
hits (okay not exactly a busy site) for that site on the correct server. So 
roughly 1 in a 1000 hits during days 3 to 6 went to the old web server, and 
this domain had the most lost hits, most of the moved domains don't show in 
the old server's log at all.

Given I think we can exclude at least 21 out of 29 safely as being 
"non-human" (sorry IBM Research if you were deeply interested in proof 
reading), and I'm guessing have made a deliberate effort to cache stale data 
for their own reasons.

So I can put an upper estimate on our sites of 1 in 1000 hits of interest 
going to the wrong site during days 3 to 6.

The most popular site moved, had only two DNS violators days 3 to 6, the most 
notable being the same "Fast Search & Transfer" IP above.
 
It may be that popular sites have a far worse problem by dint of exercising 
more caching code, but this site is far from being our most popular. And 
these sites were moved by reducing the TTL to a low value (10 minutes) and 
keeping it there for a long period of time, before we actually performed the 
move.