North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: mail does bounce (was: Customers down?)

  • From: Sean Donelan
  • Date: Sat Jan 01 23:57:00 2000

On Sat, 01 January 2000, Jim Duncan wrote:
> I recommend you not use three-digit RFCs as authoritative references.  
> I agree with your line of reasoning, but current arguments should be 
> supported by current observations and decisions, as documented by current 
> standards.  I see a lot of crap on the net that's crap fundamentally 
> because its development was based on a three-digit RFC. :-(

Whether or not you use three-digit RFCs as authoritative references, other
developers do, so you should expect to deal with such behaivor.

Too many people think the motto be conservative in what you send and liberal
in what you receive only applies to the other guy.  He should never send
something I don't expect, and should accept anything I decide to send.

> The human element causes many problems, and yet it is the ultimate 
> solution for all of them.  Rather than strive solely to make things 
> perfect, try to make things as close to perfect as possible *and* put some 
> thought into what you will do when things fail.  Assume that "calling the 
> human element into play" is a part of making things perfect, whether 
> you're dealing with failure of timely delivery of a message to the 
> intended recipient or failure of network connectivity to fail over 
> gracefully to another path.

The question is how long do you let things wallow in a state of computerized
uncertainity before you call the human.  Things fail, and sometimes people
create situations which make the failures even worse.  Do you immediately
inform the human, try a few times and then inform the human, keep trying
until the human notices its not working.  Different people make different

If a mailer wants to queue mail on a DNS server failure, that is acceptable.
But you shouldn't expect or depend on that behaivor.  If you disconnect all
your name servers from the network; some applications can, and therefor you
should expect they will, treat it as a permanent error.

Whether you think its the fault of the human who wrote the application,
or the human who put all the name servers on the same subnet; it shows
problems are rarely due to a single mistake.  Rather a combination of
decisions, which may have appeared justified alone, when taken together
cause trouble.

Its like the backhoe problem.  There would  be no affect on customers if
either a) carriers provisioned sufficient alternate capacity OR b) a backhoe
didn't cut the fiber.  Its the combination of insufficient diverse capacity
AND a backhoe attack which causes the problem for the customer.  Unfortunately,
the first instinct is to blame the other guy.  So telco's blame the backhoes;
and the excavators blame the telco's.