North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Limits of reliability or is 99.999999999% realistic

  • From: Robert Cooper (by way of Robert Cooper <[email protected]>)
  • Date: Mon Nov 27 14:51:50 2000

At 08:24 PM 11/25/00 -0800, Sean Donelan <[email protected]> wrote:

>But back to my question.  What is the real requirement?  Amazon.COM had
>system problems on Friday, and their site was unusuable for 30 minutes,
>definitely not 99.999%.  But what did that really mean?  The FAA loses
>its radar for several hours in various parts of the country.  What did
>that really mean?  Essentially every system given as an example of "high-
>availability, high-reliability" I've looked at, doesn't hold up under
>close examination.
>
>Is 99.999% just F.U.D. created by consultants?
>
>Instead of pretending we can build systems which will never fail, should
>we work on a realistic understanding of what can be delivered?

For some actual *data* on reliability in the US phone system take a look at: 

"Sources of Failure in the Public Switched Telephone Network" 
D. Richard Kuhn
IEEE Computer Magazine, April 1997, Vol 30, No 4. pp31-36. 

OK, it's not actual data but a summary of it. It's culled from data that the phone companies have to supply to the feds for every outage that affects more than 30,000 subscribers -- about the number supported by a central office. Measured in user-outage-minutes (duration of outage * number of subscribers) he shows availability in the range of 99.999% for data from the 1992 and 1993. 

More interesting is the summary of the sources of outages and how much they contribute to the total picture. For instance, if you were one of the average of 250,000 customers who lost service for an average of 500 minutes due to vandalism, you might not think 99.999% means very much. 

On the other hand the data is somewhat conservative since it counts total number of subscribers, not the total number of active or would-be active users during the outage. The data does include overloads which in the PSTN manifest themselves through call admission control (e.g. network busy signal). 

What data exists for the Internet?

[ Standard disclaimer ] 

Robert Cooper
Ironbridge Networks
55 Hayden Ave, Lexington MA 02421
www.ironbridgenetworks.com