North American Network Operators Group Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical Re: External Events (was Re: www.etrade.com has no DNS A record !)
I support John in his opinion that NANOG is not an appropriate forum for real-time outage reporting. -alan Thus spake John Hawkinson ([email protected]) on or about Wed, Jan 26, 2000 at 10:48:56AM -0500: > > > > On Tue, 25 January 2000, John Hawkinson wrote: > > > Is your goal to get the word out to network providers of people > > > who use E*TRADE? Do you really expect that many of them will > > > forward this announcement or make good use of it? Should > > > a message be sent to NANOG every time CNN, Netscape, or Yahoo > > > go down? > > > > > > Am I missing something here? [Like a sense of humor?] > > > On Tue, Jan 25, 2000 at 08:40:46PM -0800, Sean Donelan wrote: > > > External events have an affect on network service and network operators. > > Why do most NOC's have one or more monitors tuned to CNN and the Weather > > channel all day and all night? Ok, I know the real reason, but what is > > the reason the sales people tell prospective clients? > > > > The question is really one of editorial policy and how significant is > > any individual event. I don't think there is really one answer which > > can cover everything. > > This is true. That is part of why I asked the question they way I did: > > > > Is your goal to get the word out to network providers of people > > > who use E*TRADE? Do you really expect that many of them will > > > forward this announcement or make good use of it? Should > > > a message be sent to NANOG every time CNN, Netscape, or Yahoo > > > go down? > > While most people interpreted it rhetorically, it was actually > asked with a significant literal component. When asking the > list a question like this, though it's hard to know how to > contend with the potential silent majority versus the exuberant > minority (I've heard from some of people who agreed with the > position I espoused). > > It appears that there is a significant population among > the NANOG readership who benefit from this sort of notification. > Personally, I believe that the notification is useful and valuable, > however my opinion is mostly that NANOG is not the right place for it. > > This is an opinion I have held for a long time, and it was solidified > back when a mailing list called [email protected] existed. I believe it > stood for "Network Status Reporting". It's awful hard to find archives > of it any more (hey, merit!), but google.com has one message cached > which demonstrates the flavor: > > | To: [email protected] > | Subject: 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 UTC > | From: ANS Network Operations Center <[email protected]> > | Date: Fri, 22 Jul 1994 11:19:20 GMT > | > | 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 GMT. > | > | At 10:00 UTC gated exited on all core routers and ENSS's. > | All networks announced by NSFNET sites were unreachable or > | experienced varying degrees of instability during this window > | while gated was restarted across the NSFNET backbone. The > | cause of this outage is currently being pursued by our engineers. > | > | Stephen Powell > | ANS Network Operations Center > > Well, though in many cases notifications were sent to nsr about > circuit outages and individual ENSS outages. I believe the > charter of the list said that it was appropriate for all sorts > of outage reporting, not simply NFSnet backbone reports, however > I seem to rarely remember that ever happening, even then. > > Similarly, the Internet Monthly Report from Anne Cooper at ISI > would summarize notable events and regionals (and anybody else, > it seemed) would submit monthly reports of significant events. > > > You didn't see discussion of high-level issues on the NSR list, > and that was the right thing; issue-discussion was seperate from > operational notification. I find that seperation to be > incredibly useful. Perhaps it is because at this point I deal > less with day-to-day operational issues (company scaling), but > I think even in the heyday I would have felt the same. > > Bill Simpson points out: > > / In the case of a small rural ISP with less than 4,000 customers, an > / amazing number of folks called about our "problem", and the NANOG list > / is just about the first place I look for a heads up or explanation. > > And of course, NANOG doesn't information about most of these outages, > and while I think it should not, that doesn't mean I do not think that > those outages should go unreported. > > I would propose that we consider creating a mechanism for that sort > of outage reporting. It seems to me that there are two broad categories: > > a) Official outage reporting from the organization experiencing the > outage > b) Unofficial outage reporting from someone affected by the outage. > > Both are valuable and occur in different ways, and unfortunately it is > the case that in today's business climate, the latter is likely to be > more accurate and detailed. > > The obvious implementations that occur to me are i) A mailing list > like NSR; just bring it back, potentially moderate it to ensure that > the usage is consistent with the charter, and redirect postings from > NANOG to such a list. ii) A web-based format where people can note > outages, and comment on them usefully (perhaps ala slashdot?). > > I think both of those ideas could work, though both have bene tried > and not worked very well for various reasons [what ever happened to > [email protected]?]. > > > I would ask, however, that someone *not* take this message as the impetus > to go out and set up such a thing, but instead try to listen to > reasoned discussion and coordinate it with the community. > > > Back to Sean: > > The Internet (RTM) worm affected only VAX and Sun computers, an estimated > > 10% of the Internet of the day. If you didn't use Sun or VAXen, it would > > have been an irrelevent event for you. > > Not only that, it affected *hosts* (unless of course, you were using > Suns or VAXen as gateways, as I'm sure many people were). Surely hosts > are outside the scope of nanog? ;-) > Seriously, though, I think it is terribly unfair to compare something like > an Internet-wide worm to a simple DNS misconfiguration. The latter is one > person's problem and can be fixed with a quick phone call to the right person > (Assuming you can find that person, 20 phone calls later), whereas the former > is a huge management problem that cannot be easily dealt with. > > > When AOL forgot to put a GUARDIAN password on its domains, and there > > where changed to a tiny ISP, if you didn't use AOL it may have been > > irrelevent to you. > > For the most part, yes, though I believe that this caused real operational > effects for large volumes of mail queued on mail servers of network providers > in North America, and so was operationally relevent. Failed DNS queries > to E*TRADE just don't have the same level of visibility. They may affect > customers equally, but they affect providers not-at-all. > > > When Cisco, Bay and GATED BGP implementations had a disagreement on > > whether ASNs could be repeated in an as-path, it may have been > > irrelevent to you if you used a different BGP implementation or > > router. > > You're being really off-the-wall here. It's quite clear that a statistically > significant fraction of North American network operators use those implementations, > so discussion is meritted. Especially because there is *something* to discuss, > not merely "Oh, look, it's broken. We can now wait until they fix it." > > > Whether a particular NSI problem, an E*Trade problem, or an Ebay problem, > > or a Cisco CCO problem is really significant enough to talk about semi- > > publically is tough. It would be nice if each company was willing to > > make timely disclosures about problems. > > E*TRADE's annual report for 1999 makes some disclosures about infrastructure > failures, by the way. > > > But as we've seen time and time again, companies would prefer to > > never to acknowledge they had any problem until it becomes > > impossible to ignore (e.g. Worldcom's 10 days of hell last summer). > > Indeed. Just because they should be reported doesn't mean they should > be reported to NANOG. > > I think outage notification and operational issue discussion are > different things and should go to different places. > > That worked well for the NSFnet with [email protected] split from > [email protected], and the Internet has only grown since then, > and the scaling benefits would be much more sizable. > > Opinions? > > --jhawk
|