North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: outages, quality monitoring, trouble tickets, etc

  • From: Sean Donelan
  • Date: Fri Nov 24 04:53:31 1995

From: [email protected] (Alan Hannan)
>  Hmm, I wonder if the Trib' would be interested in knowing when the
>  DS3 from Pensaulen is down.....

Sometimes the Trib would be interested, most of the time they wouldn't.
Who cares if a central office in Hillsdale, IL burns down?  Some people
thought it was front page news.

>  At a precursing glance I would agree with you.  However, let us
>  delve into this a bit deeper.  Donning my idiot hat may I point out
>  that the _most_ important thing is network reliability -Period-.

That would be great, please give me the name of a network provider
which provides perfect network reliability.

In the absence of perfection, please tell me what went wrong when I
can't get an expected level of usability out of the network.  You can
greatly reduce your customers' stress levels simply by keeping them
informed. Give me the TCP/IP equivalent of "*beep* *boop* *BEEP* We're
sorry your TCP/IP connection can not be completed due to an earthquake
(software glitch, route table overload, nuclear detonation) in the area.
Please hang up and try your call later."

With an accurate RA database, and a little magic, the route servers
could redirect connections to an intercept message.  That should
send a shiver up the spine of your network security folks.

It would be nice if the problem is also fixed quickly, but I realize
that is asking for a lot.  In the mean time, keep the customer informed.
As the size of the Internet has grown, keeping the customer informed
is a bigger job.  Relying on a 1-800 number doesn't work when a large
NSPs backbone melts down, and all the NSP's customers call the NOC at
the same time.

>  Your page at DRA is quite good, however the concensus among
>  upper management (not just at our site) is "Why should other
>  people know when we're broke?".  And the sad thing is, I am
>  tempted to agree with them.

Thanks for the complement.  I would point out to your upper management
other people already know when your network is broken.  If they didn't
notice, it wouldn't be a problem.  If a network falls over in the woods,
and there was no one to hear it, does it make a sound?

Tell your upper managers, the only time people don't know when your
network is broken is when your network is irrelevant to their work.  I
don't know about you, but if I was managing an irrelevant network, I
would be working on my resume.  Maybe that's why so many people in
this business keep switching employers? :-)

>  Do you really want outage and downtime on public record, or do you
>  want easier access to clueful folx?

As a network user (operator, manager):

   - Ideally I want a useable network.
   - When I can't use the network, I want an explanation.
   - I want the problem fixed so I can use the network again.

How you meet those needs, I don't care.  If you fix the problem
before I'm effected by it, then I don't care about the intermediate
steps either (tree falling in the forest).  If I can get the explanation
from an automated server, then I don't have to bother your clueless or
cluefull folk.  If your clueless folk can take a log and get it resolved,
I don't have to bother your cluefull folk.

When I need to dig out my magic cache of business cards and start
e-mailing/calling the secret members of the "backbone cabal" to get
a problem fixed, I consider it a failure of the process.  That is
what I meant when I said NOC-to-NOC communications has been a long-term
Internet problem.  It relies on personal contacts, rather than a
reliable process.  I try very hard not to handle problems by directly
contact people I know at other NOCs.  I prefer handling problems through
the normal NOC channels.  If you do happen to receive a phone call
directly from me about a network problem, something has gone very wrong.
--
Sean Donelan, Data Research Associates, Inc, St. Louis, MO
  Affiliation given for identification not representation