North American Network Operators Group Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical Re: Monitoring highly redundant operations
On Wed, 24 Jan 2001, Simon Lockhart wrote: > > >But he does raise an interesting problem. How do you know if your > >highly redudant, diverse, etc system has a problem. With an ordinary > >system its easy. It stops working. In a highly redudant system you > >can start losing critical components, but not be able to tell if > >your operation is in fact seriously compromised, because it continues > >to "work." > > Indeed. We currently monitor each part of our operation from a monitoring > station on our network. Under certain conditions, this can give us both > false positives and false negatives: > > - We've lost off-site routing. Our monitoring station can see all our > nodes okay, so it thinks everything is fine, but no-one else can see them. > With our monitoring software we also check a few off-site links (our interfaces on our uplinks routers and the router after that) it tends to work well. > - We've lost routing to just the part of our network with the monitoring > station on. It reports that everything is down, when in fact stuff is > working fine for serving the rest of the internet. > For that situation the software we use allows us to set dependencies, ie, servers A B & C depend on router Z, if router Z is down, assume server A B & C are unreachable/down (but dont start spewing out alerts about it) Unfortunately the software is MS based (Enterprise Monitor, now named IP monitor iirc) I first came across it while working at Xerox, it resides on the only MS box on our network (beyond customer machines, and yes, it's kinda of an oxymoron, a windows monitoring box). > One way we plan to overcome these issues is to locate monitoring stations > on other ISPs networks at random places on the internet. If you correlate > the results from these multiple monitoring stations, then you get a better > view of what the rest of the internet is seeing. > A kind of distributed monitoring system would be nice, or just having people who agree to give you access to add your systems to their monitoring systems (easily done with some software, not so easily with others) I also do this to a small extent. Matthew S. Hallacy XtraTyme Technologies > Simon > -- > Simon Lockhart | Tel: +44 (0)1737 839676 > Internet Engineering Manager | Fax: +44 (0)1737 839516 > BBC Internet Services | Email: [email protected] > Kingswood Warren,Tadworth,Surrey,UK | URL: http://support.bbc.co.uk/ > > >
|