North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Zebra/linux device production networking?

  • From: Kevin Day
  • Date: Tue Jun 06 18:59:09 2006

On Jun 6, 2006, at 4:42 PM, Nick Burke wrote:

How many of you have actually use(d) Zebra/Linux as a routing device (core and/or regional, I'd be interested in both) in a production (read: 99.999% required, hsrp, bgp, dot1q, other goodies) environment?

And, if you care to spend this much time, what pitfalls/benefits did you find out about after implementation?

We started out on a FreeBSD/Zebra routing solution for our company (content provider). While it did work acceptably for many years, it wasn't what I'd call robust.

The "router" was a single P4 2.4GHz server. We had 4 GigE ports to 4 uplinks, each giving us a full BGP feed. Then two more GigE ports to our switches. We could route over 750mbps easily, without any packet loss or latency.

The biggest issue we'd have was Zebra's single-threadedness. After a restart of bgpd, it would spend so much CPU time handling the BGP updates that it would get very very behind in processing BGP keepalives, and our sessions would time out before it had finished handling the initial burst. I'd have to shut down all sessions, then bring them up one at a time. It wasn't so much bgpd taking that much CPU, but bgpd not having very much left after the server was handling a few hundred mbps of traffic. Perhaps a dual CPU server would have worked better, but we never tried.

There were also issues where you could get two zebra routers deadlocked - they'd both have many megabytes of BGP updates to send each other, and both would want to send a full update until completion before accepting any data in. Mucking with the kernel to allow TCP sockets to have a 16MB receive buffer helped, but still wasn't a cure.

You're also giving up things like RIBs, fancy queuing/rate limiting, and any kind of hardware acceleration. Doing hundreds of megabits is easy, but software based routers seem to have trouble under DoS situations (lots of tiny packets) quicker.

However, it was about as close to free as you could get. We re-used an old server, and only had to buy some 2 port ethernet cards. Support for Zebra is pretty iffy though. More often than not, I'd post a message to the Zebra mailing list to report a bug, and would get a "Yeah, known bug!" reply. The original author has all but abandoned development, leading to a fork called Quagga. Quagga is better (we still use it in a few places), but is still mostly a polished up Zebra.

In the end, we needed to start pushing more traffic than we were able get our Zebra box to do. A couple 20+ minute outages during peak usage because of deadlocked bgpd processes helped my case that we needed to buy some Junipers instead.

I know you're not giving specifics, but any kind of description of just how much traffic you're intending to push and how many ports you need would help in giving relevant advice. If you're talking about 1 BGP feed for 10mbps, I'd say go for it. If you're talking about a dozen sessions, and 2gbps of traffic... no way. Where you are between those is what really matters.