North American Network Operators Group Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical Re: Resilience: faults, causes, statistics, open issues
On Jan 27, 2005, at 6:39 AM, András Császár (IJ/ETH) wrote: This is self-serving, but see the intro and related work sections of my thesis (we'll have a conference paper version of it done soon for NSDI, but we're still revising it. Apologies for not having a shorter reference to give you): http://nms.lcs.mit.edu/papers/index.php?detail=113 It doesn't focus specifically on carrier failures, but it has a batch of references that might get you started on what the academic side knows. I've also got some refs in there to some of the earlier teleco studies, which I recommend taking a peek at. Again, relation to year 2005 ISP failures isn't totally clear, but it's a starting point. Unfortunately, the reality is that we don't actually know all that much as far as what's _really_ happening! Nick Feamster and I took a look at some of the BGP routing failures (but didn't get back to root causes): http://nms.lcs.mit.edu/papers/index.php?detail=23 Nick's also done some work on configuration management and building a better routing protocol that's somewhat related to your question. Ratul Mahajan examined BGP configuration errors - but it's not clear exactly what fraction of failures or downtime are really due to those errors: http://www.cs.washington.edu/homes/ratul/bgp/index.html David Oppenheimer studied failures at a few edge companies (app. service providers, hosting providers, etc.). Has a nice breakdown of failure causes and durations, but it's not clear if those numbers directly translate to the carrier realm: http://roc.cs.berkeley.edu/papers/usits03.pdf Finally, google back for some of Sean Donelan's NANOG posts. You'll get some good individual cases from those, though the last time I looked, I didn't find a big overall analysis. Also, do you have any suggestions on open research issues to be solved in the area?Most of it. :) I (and probably others on this lis) would be interested in what you find. -Dave
|