North American Network Operators Group Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical Re: Scalable Mail solution with NAS
On Wed, 31 Jan 2001, Matthew Zito wrote: > On Wed, 31 Jan 2001, Eric Sobocinski wrote: > > At 11:06 AM -0500, 01/31/2001, Sebastien Berube wrote: > > >One way to fix this > > >issue would be to use a hashing scheme to split the amount of actual > > >mailboxes into a subdirectory structure. You could get something like > > > > > >[email protected] would have his mailbox in > > > > > >/export/mailboxes/j/o/h/n/johndoe.mbox > > > > > >so in /export/mailboxes, in order to find the j directory, you only have > > >about 36 directories entries or so. > > > > > >Although this example is not good in the case where you accept usernames > > >with 3 or less characters. > > > > It's not hard to right-pad any short usernames before hashing. For > > instance, the username "bo" might hash as "bo__" and thus would end up in > > the directory "/export/mailboxes/b/o/_/_/bo.mbox". If you allow > > non-alphanumerics you'll want to translate those to something innocuous as > > well, or a name such as "bo.lee" will cause problems. > > Well, hashing like that works well from the standpoint that it's very easy > for the software to find the mailbox. It's going to make things like backups > very costly, though, because of all the recursive directories. Also, you're > going to end up with some directories very imbalanced, since there are more > frequently occurring names. In order to remedy this rather easily, you can always run the username through a hashing function and use the first 'n' letters of the hash to figure what directory the mail(box|dir) is in. That also prevents problems with non-alphanumerical characters such as "." > > If you're going to use NFS, you probably want to use something like maildir > format. - which is nfs-safe but becomes very costly as the number of messages > increase. A lot of that has to do with the performance of the remote nfs > server - the underlying filesystem's performance in reading large directories > will make a BIG difference as far as that goes. Netapps have excellent > large-directory performance, fwiw. > > If you're looking for large scalability AND high performance, my preferred > solution would be to have a relational database as the backend, but don't > store any messages in it - simply pointers to their location on disk. Then > store the messages without regard to intended username in a hashed directory > structure. The pop3 server then gets the list of new messages from the > database server, which could just be a list of filenames. Then, the pop3 > server simply has to open the message to return it - it doesn't have to do an > opendir(). Also, if you use the filename as the UIDL returned, there's no > need to even stat() the file, again saving you a whole nfs call. The > obvious downside is that you can't do a : > > rm -f /users/j/o/h/n/johndoe.mbx > > But, with 200k mailboxes, you should have an automated way to do that anyway. It also makes backups a nightmare. In that case, you'll have to shutdown the entire mail system before you can backup or you'll have a database image which won't represent the actual data you have on your NAS. > > Thanks, > Matt > > -- Sebastien Berube Operation Center Systems Administrator [email protected] In Gary we trust.
|