North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

RE: [load balancing] RE: LoadBalancing products: Foundry ServerIron

  • From: Karyn Ulriksen
  • Date: Thu Jul 06 13:54:45 2000

This was my experience with Arrowpoint:

(BTW, we ended up abandoning the Architecture outlined below.  It was
originally designed with certain goals in mind and decided to go another
route.)

###########################INTERNAL MEMO ####################
Date:	29 December 1999

To:	#########, President
	#########, Sr. Vice President Technology

From:	##########, Chief Systems Architect

RE:	Arrowpoint Communications CS100 / CS800 Solution




The review and testing of the Arrowpoint Communication products CS100 and
CS800 as possible solutions for the CompanyXYZ distributed virtual hosting
platform has been concluded and this product line has been determined to be
insufficient to our defined goals. This conclusion is based on development
testing, production service, information provided by the upper levels of
technical support & senior development staff, and Arrowpoint defined
development timelines.  Additional facts pertaining to Arrowpoint’s sales
practices are referenced here for consideration in future product
assessments.


Architectural Goals for Product
Outline of SystemXYZ Architectural Goals as initially provided to Arrowpoint
Communications.  These goals were repeated in their entirety to various
technicians, developers, and technical managers:

	Two clusters (one unix, one NT) initially served by one CS100
device.  Each Cluster will initially retain two servers but should be able
to scale up to at least 16 servers per cluster.  Each Cluster is expected to
support 2000 websites for a total 4000 websites.

	Due to the unique architecture of the platform, each server will
maintain unique bindings for each website instance.  DNS portrayal to the
Internet will be propogated as one (1) single IP to represent each cluster.
The Arrowpoint device will intercept the intended address and based on layer
4 HTTP/1.1 compliant headers will determine redistribution based on the
‘Host’ key-value pair of the HTTP header.  The redistribution rules will
utilize NAT to redirect request to local IP:Port binding unique to each
website/server instance.

	Rule utilization is as follows:

		2 Clusters x 2000 sites  =	4000 HTTP Services
		2 Clusters x 2 Servers x 2000 IP:Ports	= 8000 rules
		
		Persistance methods:	Load (primary), cookie (on request)

		Expected Throughput for both clusters:	8mbps to 3gbps

	

Product Claims
White papers made available online for full release versions and beta
versions detailed that expectations were well within defined product
guidelines.  These technical outlines were reinforced with product
documentation upon delivery.  Discussions with the attending salesman and
sales engineers confirmed that the product could comfortably achieve the
defined goals.  Based upon discussions regarding development cycles and
product extensibility, it was determined that product testing was desirable.
Testing Results
Testing and review consisted of several phases including manageability,
resource utilization, network throughput, reliability, and scalability of
two versions of full release software and several beta versions.

Manageability:  All CLI based management is straightforward.  Configuration
is for the most part logical and heirarchal.  Later in testing, it became
apparent that some features needed some massaging.  Stalling in scrolling
through large configurations a know bug being addressed in version to be
release in mid-2000 using “collapse-able” configurations.

Resource Utilization:  Concerns first arose when 40 sites (20 per cluster x
2 clusters) were configured and tested.  Utilization climbed and stayed at
38% CPU utilization.  This test included NAT, round robin, and approximately
35mbps of traffic.  Future testing would indicate that CPU utilization rose
in conjunction with rule sets and load.  Addressing this issue became the
focus of the next four months of testing, product revisions, and major
concerns.  

During the following months, various concessions were made to the SystemXYZ
platform to accommodate redefinitions of the CS100 resource availability.
Despite that Arrowpoint sales and sales engineers identified certain ranges
that could be expanded to accommodate larger rule sets than those defined
with in the technical specification papers, it quickly became apparent that
the 5000 rule sets defined in the white sheets was an inflated number.
Apparently, modifications in the software that added functionality also
introduced additional memory requirements and these were not reflected in
the white sheets.  Over a period of six months, we went from assurances that
the CS100 product could accommodate 5000+ rule sets, down to the major
concerns of Arrowpoint offering to give us three additional CS100’s to
support a total of 250 rule sets after determining that the product could
reliably support 250 ruleset at that level.  Additionally, under this
configuration, certain methods of determining service status were no longer
available.  This meant that service status was only determined via ICMP
calls which can indicate whether a server is available on a network, but
which are poor indicators of a running WWW server.  A review of future code
revisions indicated that by MID-2000, a partial solution would probably be
made available.

A review of the CS800 solution offers an impractical remedy for the resource
demands.  The CS800 offers only a 100% to 200% increase in processing
capabilities due to the software architecture’s memory demands.
Explorations of memory expansion capabilities with Arrowpoint development
members resulted in the consensus that the boards they base the architecture
on could not physically accommodate additional memory that would be required
to meet.  The CS800 uses up to two of these same boards of up to 256MB RAM
each.  This revelation indicated that our concept of the CS800 as an upgrade
path was invalid since all experience with the product was indicating that
512MB would be required to run 250 services with 2 ruleset each.

Network Throughput:  As a matter of course, two moderate volume Anonymous
FTP servers where introduced to the CS100.  The two servers only serviced
FTP and telnet access (for maintenance).  The CS100 was configured for 20
basic WWW / Round robin rulesets for use by SystemXYZ platform developers
and the handfull of rulesets required to support the two ftp servers.  The
FTP servers were returns to standard TCP/IP Round Robin DNS outside of the
CS100 environment after several code revisions, a full release code, and
several undefined CS100 device crashes resulting in either no restoration of
services or partial restoration of service.  Through two full releases and
various in-between beta releases, a bug was apparently allowed to persist
that did not release closed sessions which resulted in up to 250 stale ftp
sessions that only cleared on power cycling (but apparently not in
crashing).  The claim for 5GBPS on the switch fabric was never achieved due
to our own resource limitations in benchmarking.  However, 35mbps of WWW
traffic did test successfully using the resources defined above.


Reliability:  Throughout the six month review and testing of the CS100
product various codes, both full releases and betas, were explored.  Beta
code was not considered when regarding the stability of the product with the
exception of understanding Arrowpoint's pursuit of addressing any ongoing
stability issues in release versions.  The first release version that was
provided was stable.  The beta versions of the following release were
understandably unstable, but did not apparently address an “undefined”
crashing problem that continued to occur in the full release.  It has been
the consensus that some type of memory leak occurs that causes the crashing,
however, up to the time of our decision to pursue other solutions; the issue
of the device crashing with or without load had not been resolved.


Scalability:  Due to the problems with other facets of the Arrowpoint
product, scalability testing was never reached.


Product Development Goals
Arrowpoint personnel are courteous professionals who devote a great deal of
time to seeking acceptable solutions.  The level of this devotion was one of
the reasons that attracted us to Arrowpoint and kept us pursuing the product
for six months.  Arrowpoint had indicated their willingness to incorporate
enhancements to their product based on feedback from SystemXYZ and there is
no doubt that Arrowpoint would have accommodated us to the limit of their
hardware.  Discussions with Senior Developers of the Arrowpoint product line
outlined their development cycle for the next year which, unfortunately, did
not include utilizing or exploring solutions to the memory barrier that was
physically limited by the circuit boards they used for the CS100, CS150, and
CS800 products.  Additionally, existing memory would be consumed by new
functions that were being added to the code.


Summary Conclusion

Despite both Arrowpoint’s and SystemXYZ’s best efforts, the Arrowpoint
product line is not well suited to the SystemXYZ virtual webhosting platform
or for recommendation to CompanyXYZ clients interested in load balancing
solutions.  It may come to pass that Arrowpoint will find a solution to
their physical memory limitations beyond their currently defined development
cycle that will result in their ability to support the number of rulesets
outlined in their white sheets.  As of this writing, the supportable rule
sets sharply dropped from 5,000 on one box to 250 across four boxes.
Although Arrowpoint offered to provide the three additional boxes at no
additional fee, subsequent box requirements would be associated with a fee,
not too mention the cost of the rackspace for the additional devices.
Additionally, coordination of the management of four devices to service 250
sites would introduce it’s own instability factors.

The current combination of the additional inferred costs, loss in critical
supported feature sets, and the projected development cycle that does not
include resolution to memory constraints outweighs any prospects for
continuing to pursue the Arrowpoint products as a practical solution.  Once
the product has survived its infancy, the feature set that Arrowpoint has
outlined will certainly be of interest to many hosting companies.   

As for recommendations to clients, there are a wide variety of load
balancing platforms that offer minimal feature sets that provide acceptable
load balancing features at a much lower pricepoint and with reputations for
stability, at minimum.