North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
Re: mail does bounce (was: Customers down?)
On Sat, 01 January 2000, Jim Duncan wrote: > I recommend you not use three-digit RFCs as authoritative references. > I agree with your line of reasoning, but current arguments should be > supported by current observations and decisions, as documented by current > standards. I see a lot of crap on the net that's crap fundamentally > because its development was based on a three-digit RFC. :-( Whether or not you use three-digit RFCs as authoritative references, other developers do, so you should expect to deal with such behaivor. Too many people think the motto be conservative in what you send and liberal in what you receive only applies to the other guy. He should never send something I don't expect, and should accept anything I decide to send. > The human element causes many problems, and yet it is the ultimate > solution for all of them. Rather than strive solely to make things > perfect, try to make things as close to perfect as possible *and* put some > thought into what you will do when things fail. Assume that "calling the > human element into play" is a part of making things perfect, whether > you're dealing with failure of timely delivery of a message to the > intended recipient or failure of network connectivity to fail over > gracefully to another path. The question is how long do you let things wallow in a state of computerized uncertainity before you call the human. Things fail, and sometimes people create situations which make the failures even worse. Do you immediately inform the human, try a few times and then inform the human, keep trying until the human notices its not working. Different people make different choices. If a mailer wants to queue mail on a DNS server failure, that is acceptable. But you shouldn't expect or depend on that behaivor. If you disconnect all your name servers from the network; some applications can, and therefor you should expect they will, treat it as a permanent error. Whether you think its the fault of the human who wrote the application, or the human who put all the name servers on the same subnet; it shows problems are rarely due to a single mistake. Rather a combination of decisions, which may have appeared justified alone, when taken together cause trouble. Its like the backhoe problem. There would be no affect on customers if either a) carriers provisioned sufficient alternate capacity OR b) a backhoe didn't cut the fiber. Its the combination of insufficient diverse capacity AND a backhoe attack which causes the problem for the customer. Unfortunately, the first instinct is to blame the other guy. So telco's blame the backhoes; and the excavators blame the telco's.