North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
Re: Followup British Telecom outage reason
Wandering off the subject of BT's misfortune ... Sean Donelan wrote: > On Mon, 26 Nov 2001, Christian Kuhtz wrote: [...] > > > Faults will happen. And nothing matters as much as how your prepare for > > when they do. > > Mean Time To Repair is a bigger contributor to Availability calculations > than the Mean Time To Failure. It would be great if things never failed. And Mean Time To Fault Detected (Accurately) is usually the biggest sub-contributor within Repair but that's kinda your point. > > But some people are making their systems so complicated chasing the Holy > Grail of 100% uptime, they can't figure out what happened when it does > fail. Similar people pursue creation of perpetuum mobile. A strange and somewhat congruent example stumbled into recently is: http://www.sce.carleton.ca/netmanage/perpetum.shtml. Overall simplicity of the system, including failure detection mechanisms, and real redundancy are the most reliable tools for availablity. Of course, popping just a few layers out, profit and politics are elements of most systems. > Murphy's revenge: The more reliable you make a system, the longer it will > take you to figure out what's wrong when it breaks. Hmm.