North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
Provider communications, is it time yet?
Has your management received enough shocks to system to make talking about provider communications worthwhile yet? Or are they still in a state of denial? Or have the shocks not been big enough to be noticed? The last couple of months have seen many of the same types of outages over and over again, yet provider communications during these outages seems minimal, or even non-existent. Having an outage lists is a nice thing for your sales people to say, but if no information is given during an outage, what was the point of setting up the list? Hint: in the absence of pro-active notifications, more and more people are going to start their own active monitoring, i.e. your most critical machines are going to get pinged to death by more and more people checking if those critical machines are up. I wonder what Netscape's current ping rate is? Acts of God and other natural disasters seem to get modest reporting. But anything else seems to be treated like a tree falling in the forest, if I don't say anything maybe no one will notice it fell down. That works great, sometimes. But it also means we have no way to collect reliable data on how well the "Internet" is doing. Which means every once in a while an acorn falls out of a tree, and hits chicken little reporter on the head, and we end up with stories on TV how the sky is falling. Then we get called into our management's office to explain what we plan on doing about the sky falling. Overall, most of us, I hope, understand the sky is not falling. But as an industry we have no data to back that up. If one provider goes off the air for eight hours, they may have shot a big hole in their 99.9% reliability. But what does that mean for the Internet as a whole? How much customer traffic was affected? Do we even know how much customer traffic exists on the Internet as a whole? The so-called denominator problem. In other words when one provider carrying 10% or 90% of the Internet traffic has problems, does it affect the total amount of Internet traffic, or is a substantial portion of the traffic re-routed around the damaged provider? I've actually been a bit surprised just how little my total traffic volume has been effected even when major providers or exchange points go off the air for periods of time. Maybe we've achieved TYMNET/TELENET nirvana, and any useful site on the net is connected to all of the IRCs :-). Q: What's the worst thing that can happen to a backbone? A: Go down, and no one notice. Fiber cuts, operator errors, software bugs, etc are normal network events. For the Internet, the unusual thing is supposed to be when a single event has a disruptive affect on more than just that one system. With only modest pre-planning, and good communications between providers, the Internet continues flapping on its merry way through most of these events. The myth of the nuclear bomb and the Internet is somewhat true, the net will keep working, but routing is going to be sub-optimal at the node which gets nuked. The problem is as an industry we don't seem to have a good grasp of exactly what is the root cause of most the problems. Sure, you can say fiber cuts are the number one cause of Internet outages, but how do you know? Maybe its power failures, or configuration errors, or any number of other things. And my biggest question, why do single events seem to have such a widespread effect on the Internet? Is it a design error, or an assembly error? Second hint: if you only have two name servers, and they are both on the same subnet, your life is going to suck when that subnet gets nuked. Spring planting and backhoe season is just around the corner, do you have a second fiber strand? Ok, what's this got to do with provider communications? Q: Find the value of the Internet traffic denominator, so then statements like 40% of the Internet traffic make sense. A: Get all Internet providers to give KC? the inbytes/outbytes of their network cloud for anonymization and consolidation. Q: What are the top three causes of Internet outages? Nanog did the top three problems, but not the causes. A: Show me your trouble tickets.... Ok, how to normalize the data? Q: Do you want active pinging or provider pro-active notification? A: Software is Internet law. Develop an Internet (not single-point) status board for providers that don't cooperate. A software message system for providers that do cooperate, i.e. spin control. Decentralize, and provider control. Q: How good are the alternate paths in the Internet? A: Unfortunately we don't get to find out until the disaster. Q: Who reports Internet outages first? CNN or the provider? A: CNN seems to still be reporting them first. -- Sean Donelan, Data Research Associates, Inc, St. Louis, MO Affiliation given for identification not representation