North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Load balancing in routers

  • From: Rodney Dunn
  • Date: Mon Apr 08 16:51:18 2002

On Mon, Apr 08, 2002 at 02:25:54PM -0400, Richard A Steenbergen wrote:
> 
> On Mon, Apr 08, 2002 at 11:02:11AM -0700, Mark Kent wrote:
> > 
> > >> > load balancing over multiple links uses a flow-hashed method. If you
> > >> > want per-packet load distribution you have to specifically enable it by
> > >> > saying "no ip route-cache" on each interface.
> > >> 
> > >> That is very deadly, please, don't anyone actually try that.
> > 
> > How so?  So it uses a little more cpu, but that may not be relevant in
> > a lot of applications (like down at the T1 level).
> 
> Besides just driving up the CPU load through the roof for no real reason, 
> process switching produced per-packet load balancing. This is not a 
> desirable thing, since it introduces packet reordering which can be VERY 
> detrimental to TCP performance. Just think, if you had a slightly 
> different cable length, packets could spend more time on one wire than 
> another, and become totally out of sync.

When you put a packet up to process level it has to wait until the
IP INPUT process runs before being switched.  This get's worse since
with every packet you have to find a longest prefix match in the entire RT.
It takes more time especially when another process is already running
and you have other packets being switched under the rx interrupt where
you do have some sort of fastswitching enabled (legacy fastswitchin or CEF)
on other interfaces.
This is explained in good depth in "Inside Cisco IOS Software Architecture"
by Bollapragada, Murphy and White. Chapter 2 specifically.

While process switching does introduce delay/jitter and reduced throughput
because of out-of-order packets there is another problem with the
old fastswitching model.  Cache maintenace.  Back when (sorry to bring
it back up) NIMDA hit Cisco routers running traditional fastswitching were
hit harder because the CPU was constantly doing cache invalidations.
With CEF that is not an issue. 

The GSR is a completely dCEF for normal IP forwarding.  
There is no fallback path like there is on the other routers.
If a packet comes in and due to a feature configuration it can't be CEF/dCEF
switched it does a fallback to the traditional fastswitching routines to
try and switch the packet.  The first packet would be punted to process level
(which is another drawback of the traditional fastswitching approach) where
the cache would be build so the next packet to that destination could be switched
without being sent to process level.   

 
>       Input flow
>            1
>            2
>            3
>            4
>            5
>  Link 1    6    Link 2
>    1               2
>    3               4
>    5               6
>            2
>            1
>            3
>            4
>            6
>            5
>       Output flow
> 
> 
> > I've had a customer on the end of 8 T1, no ip route cache, on a 4700
> > (their end) and a 7206/300 (my end).  4700 runs a little hot, but survives.
> > 
> > Similarly, I currently have a couple of 4*T1, a 3*T1, and several 2*T1
> > on PA-MC-T3 ports on a 7206/300 with no issues whatsoever.  Max cpu
> > usage is 35%.  Everything works.
> 
> If all you want to do is a few T1's on an NPE300, you'll be fine. I'm
> certain Alex is used to doing more and scraping every last packet out of
> his routers. :)
> 
> > Now, contrast that with my first use of cef, this was back when the
> > only cef configuration was "ip cef" or something similar.  Very
> > difficult to screw things up when the config is a one-liner, and yet
> > when I turned this on the 7206 immediately crashed.  
> 
> It's really not much more complex now. I saw some "CEF Watchdog" (to check
> for dCEF corruption) type functionality in recent 12.0S builds, but on a
> 7200 it doesn't matter since is distributed.

What you are referring to is the CEF Inconsistency Checker.
http://www.cisco.com/warp/public/105/cefincon.html

I scans the RT on the RSP/GRP and then compares to what is on the LC's
to catch situations where they get out of sync.  If and when it does
happen it's a bug no matter what triggered it.


> 
> As for your crash... Well, my first guess is that you were running the
> "wrong" IOS image. 7200's are simple enough that they are usually safe to
> run whatever the "newest" code on. That practice that will get you burned
> on GSR's. But in the end... It's Cisco, what do you expect. Call TAC or 
> try again with new code. :)

CEF is on by default on the 72xx after 12.2(6), 12.1(12), and 12.1(10)E
via: CSCdu81146

hth,
rodney

> -- 
> Richard A Steenbergen <[email protected]>       http://www.e-gerbil.net/ras
> PGP Key ID: 0x138EA177  (67 29 D7 BC E8 18 3E DA  B2 46 B3 D8 14 36 FE B6)