North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: large organization nameservers sending icmp packets to dns servers.

  • From: Paul Vixie
  • Date: Fri Aug 10 19:42:55 2007

> Your comments have helped.

groovy.

> When TCP is designed to readily fail, reliance upon TCP seems questionable.

i caution against being overly cautious about DNS TCP if you're using RFC 1035
section 4.2.2 as your basis for special caution.  DNS TCP only competes
directly against other DNS TCP.  there are only two situations where a DNS TCP
state blob is present in a DNS target ("server") long enough to be in any
danger: when doing work upstream to fulfill the query, and in zone transfers.

when answering DNS TCP queries in an authority server, there is by definition
no "upstream work" to be done, other than possible backend database lookups
which are beyond the scope of this discussion.  these responses will usually
be generated synchronous to the receipt of the last octet of a query, and the
response will be put into the TCP window (if it's open, which it usually is),
and the DNS target ("server") will then wait for the initiator ("client") to
close the connection or send another query.  (usually it's a close.)

when answering DNS TCP zone transfer requests in an authority server, there is
a much larger window of doom, during which spontaneous network congestion can
close the outgoing TCP window and cause a DNS target ("server") to think that
a TCP session is "idle" for the purpose of RFC 1035 section 4.2.2 TCP resource
management.  while incremental zone transfer is slightly less prone to this
kind of doom than full zone transfer, since the sessions are shorter, it can
take some time for the authority server to compute incremental zone "diffs",
during which the TCP session may appear "idle" through no fault of the DNS
initiator ("client") who is avidly waiting for its response.

lastly, when answering DNS TCP queries in a recursive caching nameserver, it
can take a while (one or more round trips to one or more authority servers)
before there is enough local state to satisfy the query, during which time the
TCP resources held by that query might be reclaimed under RFC 1035 section
4.2.2's rules.

the reason why not to be overly cautious about TCP is that in the case where
you're an authority server answering a normal query, the time window during
which network congestion could close the outbound TCP window long enough for
RFC 1034 section 4.2.2's rules to come into effect, is vanishingly short.  so
while it's incredibly unwise to depend on zone transfer working from a small
number of targets to a large number of initiators, and it is in fact wise to
firewall or ACL your stealth master server so that only your designated
secondary servers can reach it, none of this comes into play for normal
queries to authority servers -- only zone transfers to authority servers.

the unmanageable risk is when a recursive caching nameserver receives a 
query by TCP and forwards/iterates upstream.  if this happens too often, then
the RFC 1035 section 4.2.2 rules will really hurt.  and thus, it's wise, just
as you say, to try to make sure other people don't have to use TCP to fetch
data about your zone.  the counterintuitive thing is that you won't be able
to measure the problems at your authority server since that's not where they
will manifest.  they'll manifest at caching recursive servers downstream.

> As DNSSEC in introduced, TCP could be relied upon in the growing number of
> instances where UDP is improperly handled.

this would be true if TCP fallback was used when EDNS failed.  it's not.
if EDNS fails, then EDNS will not be used, either via UDP or TCP.  so if
improper handling of UDP prevents EDNS from working, then EDNS and anything
that depends on EDNS, including DNSSEC, will not be used.

> UDP handling may have been easier had EDNS been limited to 1280 bytes.

if you mean, had EDNS been limited to nonfragmentation cases, then i think
you might mean 576 bytes or even 296 bytes.  1280 is an IPv6 (new era) limit.

> On the other hand, potentially larger messages may offer the necessary
> motivation for adding ACLs on recursive DNS, and deploying BCP 38.

i surely do hope so.  we need those ACLs and we need that deployment, and if
message size and TCP fallback is a motivator, then let's turn UP the volume.