North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: NNTP servers

  • From: Joe St Sauver
  • Date: Tue May 06 21:50:42 2003

Hi Drew,

#            Howdy, if this is off-topic I certainly apologize however I
#believe that running an NNTP server is usually part of a 'network
#operations' sphere of influence. 

Dunno about that, but I'll chime in a couple of ideas just because the volumes 
of NNTP traffic involved have gotten to the point where the traffic alone is 
probably operationally interesting, everything else aside. 

#I have a few basic questions. Does anyone
#know off hand how much disk is needed for a fairly respectable NNTP server
#for a full feed? 

Daily Usenet volumes are extremely sensitive to decisions with respect to 
carrying (or not carrying) even single groups. See, for example
http://www.newsadmin.com/top100bytes.htm which shows that the top half dozen
groups (by bytes posted) had daily volume running:

     Binary Newsgroup                           Bytes   % Total
     ----------------------------------------------------------
1    alt.binaries.dvdr                 30,304,095,023    5.893
2    alt.binaries.cd.image.xbox        25,796,723,944    5.017
3    alt.binaries.dvd                  19,428,583,576    3.778
4    alt.binaries.multimedia           17,783,671,185    3.459
5    alt.binaries.cd.image.games       15,303,064,035    2.976
6    alt.binaries.svcd                 14,780,524,967    2.874

[commas added to byte counts for improved legibility] Yes, carrying or not
carrying a single group can have a 30GB/day impact. 

Yes, daily traffic for a fullish feed *has* peaked in excess of 600GB and 
3 million articles/day. If you want to carry "everything," you could
multiply ~0.6TB/day times the number of day's retention you want to keep,
however note that over time your retention will drift downward as volumes
continue to increase. 

Also note that this is is just raw article storage space, and does not 
include space for article overview data, history files, etc.

Daily Usenet volumes (in bytes) are also exceptionally sensitive to maximum
article size, with the 80/20 rule roughly holding for byte traffic and article
count (e.g., 80% of the articles by article count will require just 20% of 
the transfer bandwidth). If your goal is to live within a given bandwidth
budget, or to efficiently utilize a particular size disk array, you can 
readily adjust your total article payload/day by dialing down the maximum
article size you elect to accept. 

In case you doubt these volume stats, a couple of sites with publicly 
accessible daily traffic summaries include:

http://nntp.abs.net/cyclone/stats/
http://informatie.wirehub.net/news/bambam/diablo.html
http://newsfeed.media.kyoto-u.ac.jp/innreport/

I would note that most "full" feeds today really *AREN'T* full, however. 

#Also is IDE still too slow/unreliable for this type of
#operation? I know back when we got our current server IDE was very slow it
#has sped up a bit since then. 

Choice of file system, and storage methodology can be as or more critical
than whether or not you're using IDE or SCSI. The days where traditional
article-per-file spools in UFS file systems would work are definitely 
gone -- cyclical news file systems on top of ReiserFS are a popular recipe 
today.

#The reason I am asking is because it has come
#time for the old NNTP server to be buried somewhere in the mountains and for
#me to procure a new one.  Currently we are running a P3 600 /w about 200 GB
#of storage on Solaris and Typhoon, the reason we are replacing this server
#is for the poor performance and its abhorrent retention.

If you're planning to work with a full feed, you won't regret getting as much
CPU, memory, disk, and network connectivity as you can afford. I don't want
to get into hardware/OS/server religious wars, so I'll skip any specifics
here (although feel free to contact me offlist if you're interested in talking
about some starting points for hardware options that seem to work okay).

Regards,

Joe St Sauver ([email protected])
University of Oregon Computing Center