North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Is anyone actually USING IP QoS?

  • From: hardie
  • Date: Tue Jun 15 18:20:42 1999

Jamie writes:
> While this thread is slowly drifting, I disagree with your assertion that so
> much of the web traffic is cacheable (nlanr's caching effort, if I remember,
> only got around 60% of requests hit in the cache, pooled over a large number
> of clients.  That probably should be the correct percentage of cacheable
> content on the net).  If anything, the net is moving to be *more* dynamic.
> The problem is that web sites are putting unrealistic expires on images and
> html files because they're being driven by ad revenues. I doubt that any of
> the US based commercial websites are interested in losing the entries in
> their hit logs.  Caching is the type of thing is totally broken by
> session-ids, (sites like amazon.com and cdnow).
> 
> The only way caching is going to truly be viable in the next 5 years is
> either by a commercial company stepping in and working with commercial
> content providers (which is happening now), or webserver software vendors
> work with content companies on truly embracing a hit reporting protocol.

The workshop results from the last IRCACHE workshop have some
interesting data on hit rates in a variety of caches
(http://workshop.ircache.net/ for the main program).  In general, it
is worse even than you assert; it is often as bad as a 40 percent hit
rate, even for a cache serving a large number of users.  There has,
however, been a fair amount of work to determine which algorithms for
cache replacement are effective; John Dilley, in particular, has
implemented several for Squid (the IRCACHE group's example cache
engine).

Like Jamie, I tend to believe that the current caching paradigm is
broken.  It relies on a community of users having sufficiently similar
patterns of use to populate a cache with resources which will re-used;
in most cases, that doesn't happen often enough to make it worth it,
except in instances where the resources are very expensive to get
(trans-oceanic links etc.) or where the cache and the aggregated user
community are very large indeed.

At a BOF at the last IRCACHE workshop, a group of us discussed the
idea of creating a caching system that acts on behalf of the content
providers rather than the user (an outward-facing "surrogate" instead
of an inward-facing "proxy").  This paradigm relies on the fairly well
documented phenomena of "flash crowds" or "cnn events" to presume that
the users accessing a particular content provider will tend to have a
high overlap for short time intervals.  This reflect my experience as
a NASA web guy, as well as the experience of some of the web hosting
providers in the room at the time.  

You won't always get the high overlap rates of a CNN event, of course,
but it seems worth checking to see if we can get better than the rates
for proxy caches.  Surrogates have their own problems, of course, but
they do solve some of the traditional proxy issues like hit metering
and authentication (since the surrogate operator has a prior business
relationship with the content provider).

This discussion and work continues on a mailing list
"[email protected]" (majordomo syntax to the -request address).
The URL of the original BOF info is
http://workshop.ircache.net/BOFs/bof2.html, for those who are
interested.
		regards,
			Ted Hardie
			Equinix