North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: External Events (was Re: www.etrade.com has no DNS A record !)

  • From: Alan Hannan
  • Date: Sat Jan 29 15:21:44 2000

  I support John in his opinion that NANOG
  is not an appropriate forum for real-time
  outage reporting.

  -alan

Thus spake John Hawkinson ([email protected])
 on or about Wed, Jan 26, 2000 at 10:48:56AM -0500:
> 
> 
> > On Tue, 25 January 2000, John Hawkinson wrote:
> > > Is your goal to get the word out to network providers of people
> > > who use E*TRADE? Do you really expect that many of them will
> > > forward this announcement or make good use of it? Should
> > > a message be sent to NANOG every time CNN, Netscape, or Yahoo
> > > go down?
> > > 
> > > Am I missing something here? [Like a sense of humor?]
> 
> 
> On Tue, Jan 25, 2000 at 08:40:46PM -0800, Sean Donelan wrote:
> 
> > External events have an affect on network service and network operators.
> > Why do most NOC's have one or more monitors tuned to CNN and the Weather
> > channel all day and all night?  Ok, I know the real reason, but what is
> > the reason the sales people tell prospective clients?
> > 
> > The question is really one of editorial policy and how significant is
> > any individual event.  I don't think there is really one answer which
> > can cover everything.
> 
> This is true. That is part of why I asked the question they way I did:
> 
> > > Is your goal to get the word out to network providers of people
> > > who use E*TRADE? Do you really expect that many of them will
> > > forward this announcement or make good use of it? Should
> > > a message be sent to NANOG every time CNN, Netscape, or Yahoo
> > > go down?
> 
> While most people interpreted it rhetorically, it was actually
> asked with a significant literal component. When asking the
> list a question like this, though it's hard to know how to
> contend with the potential silent majority versus the exuberant
> minority (I've heard from some of people who agreed with the
> position I espoused).
> 
> It appears that there is a significant population among
> the NANOG readership who benefit from this sort of notification.
> Personally, I believe that the notification is useful and valuable,
> however my opinion is mostly that NANOG is not the right place for it.
> 
> This is an opinion I have held for a long time, and it was solidified
> back when a mailing list called [email protected] existed. I believe it 
> stood for "Network Status Reporting". It's awful hard to find archives
> of it any more (hey, merit!), but google.com has one message cached
> which demonstrates the flavor:
> 
> | To: [email protected] 
> | Subject: 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 UTC 
> | From: ANS Network Operations Center <[email protected]> 
> | Date: Fri, 22 Jul 1994 11:19:20 GMT 
> | 
> | 07/22/94 NSFNET Backbone Unreachable 10:00 - 11:00 GMT.
> | 
> | At 10:00 UTC gated  exited on all core routers and ENSS's.
> | All networks announced by NSFNET sites were unreachable or 
> | experienced varying degrees of instability during this window
> | while gated was restarted across the NSFNET backbone.  The
> | cause of this outage is currently being pursued by our engineers.
> | 
> | Stephen Powell
> | ANS Network Operations Center
> 
> Well, though in many cases notifications were sent to nsr about
> circuit outages and individual ENSS outages. I believe the 
> charter of the list said that it was appropriate for all sorts
> of outage reporting, not simply NFSnet backbone reports, however
> I seem to rarely remember that ever happening, even then.
> 
> Similarly, the Internet Monthly Report from Anne Cooper at ISI
> would summarize notable events and regionals (and anybody else,
> it seemed) would submit monthly reports of significant events.
> 
> 
> You didn't see discussion of high-level issues on the NSR list,
> and that was the right thing; issue-discussion was seperate from
> operational notification. I find that seperation to be
> incredibly useful. Perhaps it is because at this point I deal
> less with day-to-day operational issues (company scaling), but
> I think even in the heyday I would have felt the same.
> 
> Bill Simpson points out:
> 
> / In the case of a small rural ISP with less than 4,000 customers, an
> / amazing number of folks called about our "problem", and the NANOG list
> / is just about the first place I look for a heads up or explanation.
> 
> And of course, NANOG doesn't information about most of these outages,
> and while I think it should not, that doesn't mean I do not think that
> those outages should go unreported.
> 
> I would propose that we consider creating a mechanism for that sort
> of outage reporting. It seems to me that there are two broad categories:
> 
> a) Official outage reporting from the organization experiencing the
>    outage
> b) Unofficial outage reporting from someone affected by the outage.
> 
> Both are valuable and occur in different ways, and unfortunately it is
> the case that in today's business climate, the latter is likely to be
> more accurate and detailed.
> 
> The obvious implementations that occur to me are i) A mailing list
> like NSR; just bring it back, potentially moderate it to ensure that
> the usage is consistent with the charter, and redirect postings from
> NANOG to such a list. ii) A web-based format where people can note
> outages, and comment on them usefully (perhaps ala slashdot?).
> 
> I think both of those ideas could work, though both have bene tried
> and not worked very well for various reasons [what ever happened to
> [email protected]?].
> 
> 
> I would ask, however, that someone *not* take this message as the impetus
> to go out and set up such a thing, but instead try to listen to
> reasoned discussion and coordinate it with the community.
> 
> 
> Back to Sean:
> > The Internet (RTM) worm affected only VAX and Sun computers, an estimated
> > 10% of the Internet of the day.  If you didn't use Sun or VAXen, it would
> > have been an irrelevent event for you.
> 
> Not only that, it affected *hosts* (unless of course, you were using
> Suns or VAXen as gateways, as I'm sure many people were). Surely hosts
> are outside the scope of nanog? ;-)
> Seriously, though, I think it is terribly unfair to compare something like
> an Internet-wide worm to a simple DNS misconfiguration. The latter is one
> person's problem and can be fixed with a quick phone call to the right person
> (Assuming you can find that person, 20 phone calls later), whereas the former
> is a huge management problem that cannot be easily dealt with.
> 
> > When AOL forgot to put a GUARDIAN password on its domains, and there
> > where changed to a tiny ISP, if you didn't use AOL it may have been
> > irrelevent to you.
> 
> For the most part, yes, though I believe that this caused real operational
> effects for large volumes of mail queued on mail servers of network providers
> in North America, and so was operationally relevent. Failed DNS queries
> to E*TRADE just don't have the same level of visibility. They may affect
> customers equally, but they affect providers not-at-all. 
> 
> > When Cisco, Bay and GATED BGP implementations had a disagreement on
> > whether ASNs could be repeated in an as-path, it may have been
> > irrelevent to you if you used a different BGP implementation or
> > router.
> 
> You're being really off-the-wall here. It's quite clear that a statistically
> significant fraction of North American network operators use those implementations,
> so discussion is meritted. Especially because there is *something* to discuss,
> not merely "Oh, look, it's broken. We can now wait until they fix it."
> 
> > Whether a particular NSI problem, an E*Trade problem, or an Ebay problem,
> > or a Cisco CCO problem is really significant enough to talk about semi-
> > publically is tough.  It would be nice if each company was willing to
> > make timely disclosures about problems.
> 
> E*TRADE's annual report for 1999 makes some disclosures about infrastructure
> failures, by the way.
> 
> > But as we've seen time and time again, companies would prefer to
> > never to acknowledge they had any problem until it becomes
> > impossible to ignore (e.g. Worldcom's 10 days of hell last summer).
> 
> Indeed. Just because they should be reported doesn't mean they should
> be reported to NANOG.
> 
> I think outage notification and operational issue discussion are
> different things and should go to different places.
> 
> That worked well for the NSFnet with [email protected] split from
> [email protected], and the Internet has only grown since then,
> and the scaling benefits would be much more sizable.
> 
> Opinions?
> 
> --jhawk