North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Internet email performance study

  • From: Robert Beverly
  • Date: Thu Apr 28 16:43:37 2005

On Thu, Apr 28, 2005 at 11:21:07PM +0300, aljuhani wrote:
> Another possiblity is that the domains you are monitoring are on dynamic IP
> addresses that changes all the time and the gap when they become
> non-responsive
> could be due to delay in updating the DNS roots with new IP address.
> Also could be a non-dedicated mail servers, meaning that server is used for
> web and DNS and when overloaded try to shed some load out and usually
> the first service to disable is SMTP.
> 
> Or that domain does have a lower priority mail server which happens to be
> down for maintenance but your DNS server is caching the data (IP address)
> of that mail server which should not happen as it has to retry the other MX
> record but
> remain a possiblity.

Hi aljuhani,

We do not consider an email lost unless it is successfully delivered
to a server (which is defined by its IP address - the IP address of
a server is the atomic unit of testing, not domains).  By this I mean
during the SMTP exchange, we monitor the response codes we get back
and will only count an email lost if we don't receive a bounce-back
after a complete series of positive response codes.  If we can't 
connect to a server, for instance in your comments above, we'll never
consider the email successfully delivered and hence it can never be
called lost.

> I have not yet looked at the details on your URL but there are number of
> things to
> consider when doing such survey.
> 
> 1.  Where is your monitoring server located in relation to the being
> monitored servers / domains.
> You need to establish a datum for how far is that server or domain using
> PING to see how
> long the packet takes on round-trip just to role out the fact of networking
> / routing issues that
> may interfer into the results which you need for the respones of MTAs.
>  
> 2. Study that domain using Dig to find MX records and DNS servers and if
> there are back up
> DNS somewhere near your network.
>
> 3. Of course as indicated above, you need to find out if the IP of that
> domain is static or dynamic.

Yes, our preprocessing step involves separating a domain into all
of the constituent IP addresses of MXes servering that domain.  The
paper has lots of details on this, but the major point again is
that we are testing IP addresses rather than domains.

> 4. Also, you need to monitor the load on your own server and DNS responses.

DNS is not an issue for testing, as I said above, we preprocess all
of the domains to generate IP addresses of MTAs which are the atomic unit
of testing.  As for the load, we did rather extensive load testing 
on our own server before putting the system into production.
 
> What I'd suggest is to use MRTG to monitor the round-trip time using PING on
> the servers being
> monitored so you have real live data that helps in establishing your final
> findings.

Not entirely certain that MRTG records would provide any useful data,
at least not the granularity that would be needed to say anything
definitively.
 
> Also not to forget that some MTAs users have thier SMTP with a filter to
> reject SMTP traffic
> that is not behaving as normal with SMTP Greeting.

Yes, our SMTP greetings are valid and up to spec.  Again, it's the
non-deterministic loss that we're most concerned about.  If there
were a problem with the SMTP exchange, we would see our emails
always rejected (for instance).  Our measurement study only includes
emails that were successfully delivered (indicated by a complete
series of successful status codes returned during SMTP exchange).

Many thanks,

rob