North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Traceroute versus other performance measurement

  • From: Daniel Senie
  • Date: Wed Nov 29 13:41:11 2000

----- Original Message -----
From: <[email protected]>
To: <[email protected]>; <[email protected]>; <[email protected]>
Cc: <[email protected]>
Sent: Wednesday, November 29, 2000 1:17 PM
Subject: Re: Traceroute versus other performance measurement


> Daniel Senie writes:
>
> | Programs such as pathchar can AT MOST tell you about latency, not about
> | bandwidth.
>
> Well, this is simply wrong.
>
> The theory of operation for pathchar is very simple: it attempts
> to build a queue at an interface, and measure the amount of time
> it takes two back-to-back packets to pass through.   The law of
> large numbers says that for any interface, given enough traffic
> emitted from pathchar, there will be a time when pathchar will
> successfully observe the minimum packet inter-arrival time, and/or
> the minimum delay for a set of varying-length packets, either of
> which will indicate the bottleneck bandwidth.
>
> Pathchar is robust against nearly everything except a bottleneck
> mismatch[*]: trying to measure a faster bottleneck (interface) than
> one closer to the pathchar-running host is subject to huge errors,
> and the very clever maths used to improve the SNR increase of the
> nearer slower interface is sometimes just insufficient.
>
> | Any cases where links are in parallel (e.g. multilink PPP of
> | multiple ISDN or T1 lines, or trunked Ethernet links) will typically NOT
> | show up in the calculations,
>
> Simple logic tells us that this doesn't matter: you end up either
> measuring the bottleneck bandwidth of the aggregate of the multiple
> paths, or the bottleneck bandwidth of a single component, depending
> on how the load-balancing works.  Pathchar tries to avoid measuring
> only the component bandwidth.
>
> More interesting is non-parallel equal-cost paths, and pathchar
> does tricks to measure the various components as can be seen;
> the problem is that there are non-parallel equal-cost paths that
> are invisible (tunnels of whatever sort, of which MPLS is a bad variety).
>
> Your complaint about this would be reasonable if pathchar weren't
> trying to measure the path characteristics that would be seen by
> a flow ORIGINATING AT THE PATHCHAR TEST BOX.  If in the multiple-path
> case such a flow is constrained to a single component, then pathchar
> is correct to report that.
>
> IOW, yes, pathchar is poor at identifying some types of network
> infrastructure, but that is not its job.  It is very good at its
> job, which is indicating the bottlenecks from source to destination,
> and giving a very good guess at the bottleneck bandwidths.

All the theory sounds great. Now, you've got a customer using the utility to
test a circuit between two boxes, and calls to complain that he's only
seeing 1/2 of the expected bandwidth, because Pathchar tells him he's
getting X, and we said we provisioned 2X. Perhaps it's just a customer
education issue.

I think you're making assumptions about how load is shared on parallel
links. Often this is done by hashing the IP address or mac address of the
packets as a way to ensure there will be no packet reordering issues on the
parallel links. You can send traffic until you clog one of the two pipes,
but will never cause spill to the other link.

>
> | This compounds other issues with trying to determine path
characteristics
> | with such tools, most especially (and as others mentioned) asymmetric
paths.
>
> On the contrary; if real live traffic (which pathchar generates)
> observes path flutter over finite time, then other real live traffic
> (as generated by users) also likely will flutter over finite time.

You're making the assumption that you'll see change in the path FROM THE ONE
STARTING POINT where you're running pathchar. This is simply not going to
happen in many cases. Equal cost multipath, trunking and even unequal cost
pathing will result in you seeing only a part of the picture.

>
> This is backed up by other observations, such as Vern Paxson's,
> that attempt to characterize the routing, delay and loss aspects
> of the Internet over long periods of time (taking advantage of
> the law of large numbers).    Pathchar just works faster and tries
> to answer the question of bottleneck bandwidth, and make educated
> guesses about the bandwidths of subsequent bottlenecks.

Yet people run the tools, believe the results, even though the results
aren't telling them the truth. This is backed up by observations in the real
world, customer complaints and all.

>
> The paper is quite good at describing alot of the theory of operation,
> http://www.caida.org/tools/utilities/others/pathchar/
> and deals explicitly with some of Daniel Senie's objections.
>
> Sean.
> - --
> [*] it is also not robust against "slow path" bottlenecks, which occur
>     when the test traffic is treated substantially differently than "real"
>     traffic, although since test traffic "through" a router en route to
>     a subsequent hop is _unlikely_ to be treated differently (as compared
>     with test traffic _to_ the same router), one can filter out
undesirable
>     artefacts to some degree by using data collected by measuring
"end-to-end".
>