North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: [cacti-announce] Cacti 0.8.6j Released (fwd)

  • From: Jared Mauch
  • Date: Wed Jan 24 15:41:41 2007

On Wed, Jan 24, 2007 at 08:34:19AM -0500, Jason LeBlanc wrote:
> 
> I would say somewhere around 4000 network interfaces (6-8 stats per int) 
> and around 1000 servers (8-10 stats per server) we started seeing 
> problems, both with navigation in the UI and with stats not reliably 
> updating.  I did not try that poller, perhaps its worth trying it again 
> using it.  I will also say this was about 2 years ago, I think the box 
> it was running on was a dual P3-1000 with a raid 10 using 6 drives (10k 
> rpm I think).
> 
> After looking for 'the ideal' tool for many years, it still amazes me 
> that no one has built it.  Bulk gets, scalable schema and good 
> portal/UI.  RTG is better than MRTG, but the config/db/portal are still 
> lacking.

	So, i've been the caretaker of a few different snmp pollers
over a few years, as well as done some database foo (250m+ rows/day
of data) and these things interrelate in a number of ways.  First
start with the polling, you need to do bulkget/bulkwalk of the various
mibs to collect the data in a reasonable way, timestamp it all (either
internally before you "cook" the data), poll frequently enough to
detect spikes (including inaccurate spikes and backwards/missing counter
bugs), etc..

	Take a simple set of data you might want to collect:

router
 interfaces (mib)
  up/down
  in/out octets, in/out packets, in errors/out drops
  speed (ifMIB too?)
 ifMIB (64-bit counters, but only sometimes)
  description 
  speed (interface mib too?)
 mpls ?
  ldp? te? paths?
 mac accounting ?

	then you get into do you store the raw data you collect with
markers for snmp timeouts, or just a 5 min calculation/sample?  (this
relates to the above 250m rows/day)  how do you define your schema?
how long does it take to insert/index/whatnot the data? how to
handle ifindex moves (not just one vendor too, don't forget that)?
how do you match that link to a customer for billing?  who gets
what reports?  engineering reports too?  provisioning link-in?  tie
to ip address db (interface ip<->customer mapping)?

	the list goes on and on, this is just part of it, let alone
any possible tracking of assets/hardware, let alone
proactive network monitoring (tie those traps/walks) to the internal
ping(er) to passive network monitoring, etc..

	this is a huge burden to figure it all out, implement and
then monitor/operate 24x7.  miss enough samples or data and you
end up billing too little.  this is why most folks have either cooked
their own, or use some expensive suite of tools, leaving just a little
bit of other stuff out there.

	in a lot of ways, just buying a ge/10ge and paying some
alternate price for it may be cheaper than a burstable rate as it
could reduce a lot of this extra cost.  i remember hearing that
it cost telcos more to count/track the calls to give you a detailed
bill than for the call itself.  this is why flat-rate is nearly king
these days (in the us at least).

	- jared

-- 
Jared Mauch  | pgp key available via finger from [email protected]
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.