North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Anycast applicable to Radius Server Farm ?

  • From: Joe Maimon
  • Date: Mon May 08 12:19:29 2006


Joe Shen wrote:
Can you indicate in more detail what the problems
were with the L4 switch?
We seperate our Radius servers into two farms, each
farm has a L4 switch in front. To our understanding,
radius authentication info. and accounting info. of a
PPPoE session should be processed by the same Radius
server.
I dont think its true. If the auth radius server fails to respond, authentication and accounting will then go to the next configured server

So, although L4 switch provides a single IP
for BRAS configuration  each BRAS is specified a real
server IP in L4 switch. So, there comes the problem:

1) Load is not balanced automatically  but by human
estimation; there is server whose load is twice of
some other server.

See if you can extract load from the radius server using snmp or something and make your l4 switch utlilize that.

2) L4 switch becomes bottleneck of service
availability. In past years, L4 switch caused several
times of service failure. Just last friday, L4 switch
does not repond to any network packets while its
ethernet interface seems OK.
Add a couple of the actual servers IPs to the aaa servers the NAS's use

3) As L4 switch is the only entrance to a single
server farm, DoS attack or some other kind of software
bug  will surely degrade security level. While, a farm
using ECMP rely on server groups to resist DoS attack.
Your firewalls should be protecting your radius servers from DoS -- unless you really expect the world to communicate with them. Spoofed sources however could be hard to protect against.

4) Maintence is a little bit costy.  Any maintence ,
no matter on radius server or on L4 switch, need a
scheduled time window.

5) Service protection is hard ( as you mentioned as
'cascade' one). As there are two server farms, if one
farm failed it takes ten or more minute to migrate
those Radius traffic to the other farm. This is
unacceptable.

Let the nas do it. they fail over much faster than that.

Whatever you choose, try to combine the ability of the nas to failover radius servers into your redundancy plan.