North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Follow-up to "ROOT SERVERS"

  • From: Brett Frankenberger
  • Date: Sat Aug 26 19:36:31 2000

> 
> Further the Network Solutions press release statesthe impact was negligible
> because DNS resolvers look for multiple servers.  This is only partially
> true.  DNS resolvers look for other servers only when a server is unavailable.
> However, when a server has incorrect information, such as a root zone missing
> a delegated zone, won't the other servers return NXDOMAIN which resolvers
> will assume is an authoritative answer?  Therefore any user of those four
> other servers would have received authoratative answers that .COM did not
> exist?  How many queries do those four servers normally handle?

The impact probably was rather small, both because resolvers look for
multiple servers (not because they try another one after receiving an
authoritative NXDOMAIN, but because only some fraction of resolvers
would have tried a broken root server first -- other resolvers, trying
other roots, would not have had the issue); and because the missing
domain was ".COM" which is likely cached in most resolvers anyway.

So the only impact was to clients of caching servers who (a) expired
.COM from their cache (or restarted) during the interval of brokenness,
and (b) proceeded to attempt to refresh it from a broken root.  And
even then, those clients would have only been impacted to the extent
that they accessed a sub-domain of .COM that was not cached.  (So, for
example, clients of a server that met requirements (a) and (b) above
could probably still get to, say, yahoo.com, because chances are
Yahoo.com and .com wouldn't both expire from the cache during the
period of brokeness.  But obscurecompany.com would have probably not
been reachable from the same caching server, because it likely wouldn't
have been in cache.)

The NSI posting indicated that troublereports came in at 18:30.  If we
assume it took at most half an hour for them to receive reports, the
start time of the failure would be 18:00.  According to the NSI post, 3
servers were corrected at 19:00, one at 19:50.  

There are 13 authoritative servers for ".".  Assuming they are all
"preferred" by 1/13 of the caching servers out there, than means that:
  69% (9/13) of the caching servers would have been querying a
non-broken root;
  23% would have been querying a broken root for 1 hour
  8% would have been querying a broken root for about 2 hours.
The COM NS records have a TTI of 6 days, or 144 hours.  So of the
servers in the second group, about 1/144 would have expired COM duing
the brokeness and thus actually have queried for .COM and received a
NXDOMAIN; and about 2/144, or 1/72, of third group would have done the
same.  So the percentage of caching servers we can expect to have
failed is:
  3/13*1/144 + 1/13*2/144 = 0.27 percent.

Not good, but certainly not catastrophic or widespread.  The NSI
release was technically inaccurate, but not far off the mark in terms
of impact.

It's also incorrect to say everything was OK at 19:50, though.  All
the root servers were apparently functioning properly then, but the
NXDOMAIN for COM likely remained cahcned for considerably longer in
those 0.27% of servers.

     -- Brett