North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Ph.D. student looking for data on network failure causes

  • From: David L. Oppenheimer
  • Date: Tue Apr 09 18:22:50 2002

By the way, just a clarification of my original message: the results of this
study will (eventually) be published in some academic forum. I'll post to
the NANOG mailing list a pointer to any results when they are available, so
there's no need to email me separately to indicate your interest in the
results. (Interest in providing data, on the other hand, would be most
welcome!)

Thanks,
David


> Hello network operators! I'm a Ph.D. student at UC Berkeley working for
Dave
> Patterson on the ROC project, which is investgiating techniques for
> improving the availability and manageability of large-scale Internet
> services and systems. I'm currently conducting a study of the root causes
> (hardware, softare, human, etc.) and durations of failures in such
systems.
> To do this, I have been examining the operations trouble ticket databases
> from several large-scale Internet services (of the Hotmail, eBay, Yahoo!,
> etc. type).
>
> In doing this research,  it has become apparent that for many services
> (especially geographically distributed ones, e.g. those that use multiple
> colocation facilities), a major cause of problems is failures of various
> types in the Internet. Thus I've become interested in finding out the
types
> and root causes of problems in wide-area networks, e.g. within the kinds
of
> large-scale ASes that are administered by the folks on this list. I'm not
> sure how your services track failures and problems; the problem tracking
> databases at the services I've examined have been a great source of data
> about problem scale, symptoms, root causes, durations, steps (and
missteps!)
> taken in diagnosing and fixing problems, etc.
>
> I'm writing to the list because I'm very interested in working with
network
> operators to study the causes of failures in large networks. I realize
this
> type of data is very sensitive to your organizations. I would be happy to
> talk offline with anyone who is interested in the possibility of sharing
> data, about how I've overcome the multitude of objections that have been
> raised by folks I have solicited for data (protecting their customers'
> privacy, securing datasets when they are not examined on the premises of
the
> services, anonymizing and aggregating data in reporting, etc. etc.). I'm
> interested in the relative causes of failures, *not* overall availability
> numbers. As a result of the precautions we've taken, several
household-name
> Internet services have allowed me to examine and report on the problems
> their servcies have experienced.
>
> If you're interested in discussing the possibility of sharing access to
this
> kind of data about your service, please contact me. I'm willing to examine
> data on the premises of your service, to anonymize it fully, to submit any
> results I want to publish to your organization prior to publication, to
sign
> any necessary NDAs, etc. In return, I'm happy to share with you any
insights
> I have about the problems your service experiences, and you'll contribute
to
> the world's knowledge of why bad things happen to good networks. :-)
>
> If you're not the right person in your organization to contact with this
> request, but you think your organization might be interested in
> participating in this study, perhaps you could forward this email to the
> appropriate person or let me know who the right person to contact in your
> organization would be.