North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Flash crowds and network management

  • From: Sean Donelan
  • Date: Sun Sep 13 16:44:28 1998

>It was small as most web pages are today. I don't think the government
>servers were hit even close as hard as some of the big news sites. I bet
>CNN has 10 x the hits the LOC site had. NetRail hosted the servers for US
>Treasury and USDA. We gave them a 100 BaseT ethernet connection into a
>core router, but it never was a big deal because their servers would die
>way before the like utilization ever got high. I have found many
>government sites are like that. I would not be surprised if LOCs servers
>died before the links maxed out. 

I tend to have a more holistic approach to customer service.  The
customer's servers dying due to the network load is concerns me as
a network manager even if though link utilization never got high.

I wouldn't jump to the conclusion that the government servers were not
hit even as close as hard as some of the big news sites, depending on
your definition of 'hit.'  Hits for traffic management purposes are
different than hits for advertising purposes.  Your servers stop
counting 'hits' when they are down, but that doesn't mean the requests
stop.  Outbound traffic goes down when the servers die, but inbound
traffic doesn't stop.

I'd like to chat someday with the webserver managers at some of
the large media web farms someday, not necessarily with a reporter
listening in, about what they were seeing.  But at the moment there
isn't a real good way for us to communicate in real-time about what
is going on.

Think of the worst SYN or SMURF attack you've ever seen and then
combine them.  The Internet doesn't have the equivalent of "choke"
exchanges found in local telephone exchanges.  And if you think
the phone network is any more reliable, remember what happened
when Garth Brooks tickets went on sale in the Capital a few years
ago.

A useful addition to CAR would be a clear way to limit SYN packets per
second, but letting other traffic through so once the person gets
connected they clear out as fast as possible.  Using the 'established'
keyword in a access list and guessing at the size of SYN packets gets
part of this.  For traffic management purposes, you want to distribute
the choke points around the entire backbone rather than just at the one
hot interface.  The queue discipline gets a bit weird because of
"duplicate" SYN packets and essentially zero packet inter-arrival
spacing.  I have a gut feeling that robots were a problem, but editing
access-lists in the middle of storm wasn't a good solution.
-- 
Sean Donelan, Data Research Associates, Inc, St. Louis, MO
  Affiliation given for identification not representation