North American Network Operators Group Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical Re: Scalability issues in the Internet routing system
Blaine Christian wrote: Another thing, it would be interesting to hear of any work on breaking the "router code" into multiple threads. Being able to truly take advantage of multiple processors when receiving 2M updates would be the cats pajamas. Has anyone seen this? I suppose MBGP could be rather straightforward, as opposed to one big table, in a multi-processor implementation.You may want to read this thread from the beginning. The problem is not Sure, if you have a bottleneck at FIB insertion you fail much earlier. I'd say if that happens it's an engineering oversight or a design tradeoff. However I don't think this is the choke point in the entire routing table size equation. Depending on the type of prefix churn you don't have that many transactions reaching the FIB. Most far-away churn doesn't change the next hop for example. Local churn, when direct neighbors flap, mostly just changes the nexthop (egress interface). In a high performant ASIC/TCAM whatever FIB a nexthop change can be done quite trivially. Prefix drop can be handled by marking it invalid and garbage collecting it later. Prefix insertions may either salvage an invalidated prefix or have to be re-inserted. The insertion time depends on the algorithms of the FIB table implementation. For all practical purposes a FIB can be designed to be quite speedy in this regard without busting the budget. The link speed between two DFZ routers has seldomly been the limit for initial routing table exchanges. Neither has TCP. It is mostly dominated by the algorithm choice and CPU of the RIB processor on both ends. BTW, I do agree that spinning new ASICs is a non-trivial task and is certainly the task you want to get started quickly when building a new system. It is non-trivial for its prefix storage size and ultra-fast lookup times. Longest prefix match is probably the most difficult thing to scale properly as a search always must be done over a number of overlapping prefixes. To scale this much better and remove the bottleneck you may drop the 'overlapping' part or the 'longest-match' part and the world suddenly looks much brighter. This is the crucial thing that got forgotten during the IPng design phase which brought us IPv6. So far we have learned that limiting the number of IPv[46] prefixes in the DFZ is not an option for commercial and socio-technical reasons. That leaves only the other option of changing the routing lookup to something with better scaling properties. I did read your comment on BGP lending itself to SMP. Can you elaborate on where you might have seen this? It has been a pretty monolithic implementation for as long as I can remember. In fact, that was why I asked the question, to see if anyone had actually observed a functioning multi-processor implementation of the BGP process. I can make the SMP statement with some authority as I have done the internal design of the OpenBGPd RDE and my co-worker Claudio has implemented it. Given proper locking of the RIB a number of CPU's can crunch on it and handle neighbor communication indepently of each other. If you look at Oracle databases they manage to scale performance with factor 1.9-1.97 per CPU. There is no reason to believe we can't do this with the BGP 'database'. -- Andre
|