North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Provider communications, is it time yet?

  • From: Sean Donelan
  • Date: Wed Mar 11 05:58:38 1998

Has your management received enough shocks to system to make talking
about provider communications worthwhile yet?  Or are they still in a
state of denial?  Or have the shocks not been big enough to be noticed?

The last couple of months have seen many of the same types of outages
over and over again, yet provider communications during these outages
seems minimal, or even non-existent.  Having an outage lists is a nice
thing for your sales people to say, but if no information is given
during an outage, what was the point of setting up the list?

Hint: in the absence of pro-active notifications, more and more people
are going to start their own active monitoring, i.e. your most critical
machines are going to get pinged to death by more and more people checking
if those critical machines are up.  I wonder what Netscape's current ping
rate is?

Acts of God and other natural disasters seem to get modest reporting.
But anything else seems to be treated like a tree falling in the forest,
if I don't say anything maybe no one will notice it fell down.

That works great, sometimes.  But it also means we have no way to
collect reliable data on how well the "Internet" is doing.  Which means
every once in a while an acorn falls out of a tree, and hits chicken
little reporter on the head, and we end up with stories on TV how the
sky is falling.  Then we get called into our management's office to
explain what we plan on doing about the sky falling.

Overall, most of us, I hope, understand the sky is not falling.  But
as an industry we have no data to back that up.  If one provider goes
off the air for eight hours, they may have shot a big hole in their
99.9% reliability.  But what does that mean for the Internet as a
whole?  How much customer traffic was affected?  Do we even know
how much customer traffic exists on the Internet as a whole?  The
so-called denominator problem.  In other words when one provider
carrying 10% or 90% of the Internet traffic has problems, does it
affect the total amount of Internet traffic, or is a substantial
portion of the traffic re-routed around the damaged provider?

I've actually been a bit surprised just how little my total traffic
volume has been effected even when major providers or exchange points
go off the air for periods of time.  Maybe we've achieved TYMNET/TELENET
nirvana, and any useful site on the net is connected to all of the
IRCs :-).

Q: What's the worst thing that can happen to a backbone?
A: Go down, and no one notice.

Fiber cuts, operator errors, software bugs, etc are normal network
events.  For the Internet, the unusual thing is supposed to be when a
single event has a disruptive  affect on more than just that one system.

With only modest pre-planning, and good communications between providers,
the Internet continues flapping on its merry way through most of these
events.  The myth of the nuclear bomb and the Internet is somewhat true,
the net will keep working, but routing is going to be sub-optimal at the
node which gets nuked.

The problem is as an industry we don't seem to have a good grasp of
exactly what is the root cause of most the problems.  Sure, you can
say fiber cuts are the number one cause of Internet outages, but how
do you know?  Maybe its power failures, or configuration errors, or
any number of other things.  And my biggest question, why do single
events seem to have such a widespread effect on the Internet?  Is it
a design error, or an assembly error?

Second hint: if you only have two name servers, and they are both on the
same subnet, your life is going to suck when that subnet gets nuked.

Spring planting and backhoe season is just around the corner, do you
have a second fiber strand?

Ok, what's this got to do with provider communications?

Q: Find the value of the Internet traffic denominator, so then statements
	like 40% of the Internet traffic make sense.
A: Get all Internet providers to give KC? the inbytes/outbytes of their
	network cloud for anonymization and consolidation.

Q: What are the top three causes of Internet outages?  Nanog did the
	top three problems, but not the causes.
A: Show me your trouble tickets.... Ok, how to normalize the data?

Q: Do you want active pinging or provider pro-active notification?
A: Software is Internet law.  Develop an Internet (not single-point)
	status board for providers that don't cooperate.  A software
	message system for providers that do cooperate, i.e. spin
	control.  Decentralize, and provider control.

Q: How good are the alternate paths in the Internet?
A: Unfortunately we don't get to find out until the disaster.

Q: Who reports Internet outages first?  CNN or the provider?
A: CNN seems to still be reporting them first.
-- 
Sean Donelan, Data Research Associates, Inc, St. Louis, MO
  Affiliation given for identification not representation