North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: Can you explain why paths to same host diverge?

  • From: John Hawkinson
  • Date: Fri Feb 27 20:16:48 1998

> The RTT from 128.114.2.91 are always much higher (many times a full order
> of magnitude) than those of 128.114.2.53.  The explanation for the
> differing RTTs turns out to be that the paths between each of my machines
> and www.uu.net are different.  In the below traceroutes, everything
> agrees up through hop five.  But at hop six, the paths diverge:

Current code from cisco in the branch supporting Cisco Express Forwarding
(CEF -- not publically available) that many NSPs are running has support
for multiple path selection based on a hash of the source and destination
address of a packet.

I thought there was a discusison of this on NANOG around September
of last year, but I cannot seem to find it, so I'm forced to conclude it
either didn't happen or happened somewhere else :-)
[...pause...]
Ah yes, it took place on end2end-interest (subscriptions to [email protected],
I believe, but possibly end2end-interest-request).

See ftp.isi.edu /pub/end2end/end2end-interest-1997.mail, specifically
the thread entitled "Multi-Path Routing", beginning on 1 Oct, about
byte offset 3070749 into the archive.

The short summary is that in the event of multiple paths and CEF, when
src/dst hashing is configured (as opposed to per-packet load
balancing), the router hashes on the source and destination address
and picks one of the multiple paths. This is deterministic and as far
as we know is the best sort of layer-three load-balancing available;
it has far nicer effects on traffic than per-packet load balancing,
and ensures far better balances than destination-based hashing alone.

I feel obligated to ask -- is there some reason you didn't
direct your query to Sprint, before asking NANOG? It really seems like
this is the kind of question they should be able to answer for you,
and diagnose the problem to some extent. I can't see a good reason to ask
here without asking the providers in question, first.

> from 128.114.2.53:
> 
>    traceroute to www.uu.net (199.170.0.30): 1-30 hops, 38 byte packets
>     1  comm-g.UCSC.EDU (128.114.2.252)  4.52 ms
>     2  frontdoor.UCSC.EDU (128.114.103.1)  1.15 ms
>     3  UC-net-dmz.ucsc.edu (208.1.176.6)  1.45 ms
>     4  bgty-lata01.ucnet.net (192.35.219.2)  7.79 ms
>     5  sl-gw10-stk-11-0-T3.sprintlink.net (144.228.146.49)  8.70 ms
>     6  sl-bb10-stk-2-1-155M.sprintlink.net (144.232.4.78)  30.2 ms
>     7  sl-bb3-stk-0-0-0-155M.sprintlink.net (144.232.4.42)  33.0 ms
>     8  Hssi8-1-0.BR1.SFO1.ALTER.NET (137.39.166.121)  42.3 ms
>     9  114.ATM3-0-0.XR1.SCL1.ALTER.NET (146.188.145.222)  76.9 ms
>    10  100.ATM2-0-0.TR1.SCL1.ALTER.NET (146.188.145.226)  41.3 ms
>    11  107.ATM8-0-0.TR1.DCA1.ALTER.NET (146.188.136.221)  110 ms (ttl=242!)
>    12  199.ATM4-0-0.XR1.TCO1.ALTER.NET (146.188.161.161)  223 ms (ttl=243!)
>    13  193.ATM5-0-0.GW2.FFX1.ALTER.NET (146.188.160.209)  158 ms (ttl=242!)
>    14  UUNET7-GW.UU.NET (137.39.12.162)  214 ms (ttl=241!)  163 ms (ttl=241!)
>    15  www.uu.net (199.170.0.30)  155 ms (ttl=240!)
> 
> 
> from 128.114.2.91:
> 
>    traceroute to www.uu.net (199.170.0.30): 1-30 hops, 38 byte packets
>     1  comm-g.UCSC.EDU (128.114.2.252)  1.29 ms
>     2  frontdoor.UCSC.EDU (128.114.103.1)  1.7 ms
>     3  UC-net-dmz.ucsc.edu (208.1.176.6)  2.82 ms
>     4  bgty-lata01.ucnet.net (192.35.219.2)  7.75 ms
>     5  sl-gw10-stk-11-0-T3.sprintlink.net (144.228.146.49)  10.6 ms
>     6  sl-bb11-stk-1-1-155M.sprintlink.net (144.232.4.98)  11.2 ms
>     7  sl-bb3-stk-4-0-0.sprintlink.net (144.232.4.14)  9.27 ms
>     8  Hssi8-1-0.BR1.SFO1.ALTER.NET (137.39.166.121)  609 ms
>     9  114.ATM3-0-0.XR2.SCL1.ALTER.NET (146.188.145.210)  541 ms
>    10  100.ATM3-0-0.TR2.SCL1.ALTER.NET (146.188.145.246)  578 ms
>    11  107.ATM8-0-0.TR2.DCA1.ALTER.NET (146.188.136.225)  633 ms
>    12  198.ATM8-0-0.XR2.TCO1.ALTER.NET (146.188.161.185)  683 ms (ttl=243!)
>    13  192.ATM12-0-0.GW2.FFX1.ALTER.NET (146.188.160.221)  622 ms (ttl=242!)
>    14  UUNET7-GW.UU.NET (137.39.12.162)  569 ms (ttl=241!)  534 ms (ttl=241!)
>    15  www.uu.net (199.170.0.30)  579 ms (ttl=240!)

Looking at your traceroutes, there are at least two cases of this sort
of thing.

First, at hop 6, but then your traceroutes are again consistent
at hop 8.

Second, at hop 9, they diverge once more, and become consistent at hop
14.

Either one of these could be the cause of what you're seeing (or both).
Additionally, it's perfectly possible that there's something along
the return path which your traceroute isn't showing.

It is worth noting, I suppose, that optioned packets (i.e. traceroute -g
or ping -R) are not CEF-switched, and therefore cannot be used to
instrument the behavior of this hash. As a result, your best bet is
limited ttl probes to various hops.

For instance, you might try

	traceroute -f 7 -m 7 -q 100 www.uu.net

from each of your hosts, to determine if the problem started after
hop 6.

--jhawk