North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: If the network is always down, how come I can post a message

  • From: Tony Tauber
  • Date: Wed Nov 01 13:56:18 2000

On 31 Oct 2000, Sean Donelan wrote:
> 
> One anecdotal data point, I've been reporting about Internet problems
> for the last five years or so. Over the last 5 years no Internet network
> event has been so severe it prevented me from reporting about the problem
> on the net.  In a strange way, my postings about the problems on the net
> are proof of the reliability of the same network.
 
> In the same time period, I've lost my telephone service several times.
 
> I've lost my pager service multiple times

> Even the Associated Press has gone down in the last five years.
> 
> So I'm sick and tired about hearing the telephone network is 99.999%
> reliable and the Internet isn't.

Me, too.  I'll put some finer points on the topic, though.

The services you cite (phone, paging, AP) have essentially one
application from a user point of view.
The Internet has thousands of different applications at the 
user level.  They handle this or that "outage" differently.
Email is particularly robust in its store and forward behavior
as a bounce is the only failure mode that readily comes to mind
(ie. any mail that's eventually delivered is successful.)

Other applications may behave differently but adding complexity
to the analysis is the fact that the number of end nodes means that
the matrix of possible src/dest pairs quickly climbs into the billions
with port and protocol multiplexing on top of that.

Thinking of things this way, it seems clear that there's no way
to measure the "up-ness" of any part of the Internet that's not
been isolated by some outage at a local entry point.  Meaning that
other than when its ethernet cable is unplugged or the WAN link out 
of the building go down, a given machine seems to be "up" as does the 
larger network, but we can be sure there's something somewhere it 
can't get to and some application that would be affected.

Given this, measurements based on 9's seem particularly ill-suited
and any metric that's not *extremely* narrowly defined seems
incalculable.

How to explain this to customers though?

One possible approach would be to remind them that the way most users
approach Internet applications approximates, "Oh, it's not working now,
time for a coffee break."  Nothing seems to stop people from building
inappropriate applications on top of IP, though.  (While I might consider
it folly to depend on a web page to send sell orders for my stocks when
the market's crashing, can I be sure that what ever other approach might
be in the back of my mind would work in a "bad time"?)

I'd like to see more discussion in this forum of new ways to think
about risk and communicate it to those outside the technical IP
community (eg. managers and customers).

Tony