North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Scalable Mail solution with NAS

  • From: Matthew Zito
  • Date: Wed Jan 31 15:25:22 2001

On Wed, 31 Jan 2001, Sebastien Berube wrote:
>
> >
> > If you're looking for large scalability AND high performance, my
> > preferred solution would be to have a relational database as the backend,
> > but don't store any messages in it - simply pointers to their location on
> > disk.  Then store the messages without regard to intended username in a
> > hashed directory structure.   The pop3 server then gets the list of new
> > messages from the database server, which could just be a list of
> > filenames.  Then, the pop3 server simply has to open the message to
> > return it - it doesn't have to do an opendir().  Also, if you use the
> > filename as the UIDL returned, there's no need to even stat() the file,
> > again saving you a whole nfs call.   The obvious downside is that you
> > can't do a :
> >
> > rm -f /users/j/o/h/n/johndoe.mbx
> >
> > But, with 200k mailboxes, you should have an automated way to do that
> > anyway.
>
> It also makes backups a nightmare.  In that case, you'll have to shutdown
> the entire mail system before you can backup or you'll have a database
> image which won't represent the actual data you have on your NAS.
>

No, no, don't do that.  Given the scale of something like this, I'd expect 
you'd be running on something like Oracle that supports the concept of "hot 
backups".  The table spaces are put into a quiesced state, and all writes are 
done to memory and to recovery logs.  Once the backup is finished, you take 
it out of hot backup and it then writes all the pending transactions to the 
database files. That way, the database files are stable, and you also back up 
the recovery logs to something with real-time access (like another nfs 
server).  In the event you have a catastrophic database failuser, you recover 
from tape (or if you have the space, you have a copy of the dbf files 
elsewhere), and run all the transaction logs - it takes about 5 minutes per 
hour of transactions.  Then your database is brought up to the point where it 
was when it died.  The worst case scenario is that there's a few transactions 
that don't get logged, which means that a few emails get dropped.  If you had 
a stock smtp server that died, you could be looking at the same situation.

As far as backing up the actual mailboxes, there's no way to get around the 
fact that it'll take long enough to finish that stuff will be inaccurate by 
the time its finished.  If you ever have to restore the mailboxes from tape 
without restoring the database, it'd be wise to have an application that 
builds a list of the messages that are on disk the database doesn't know 
about.  

Thanks,
Matt

-- 
Matthew J. Zito
Systems Engineer
Register.com, Inc., 11th Floor, 575 8th Avenue, New York, NY 10018
Ph: 212-798-9205
PGP Key Fingerprint: 4E AC E1 0B BE DD 7D BC  D2 06 B2 B0 BF 55 68 99