North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: MTU problems with GRE tunnels (fwd)

  • From: Alex P. Rudnev
  • Date: Mon Jul 06 12:41:13 1998

It's well known problem... not for Cisco (any connectionless-based 
tunnelling cause MTU to be decreased) but for those MS-based application 
which do not know how to deal with the fragmentation AND use big (1500 
bytes) packet sizes. The only object have to be treated here is 
applications, not routers... Through it's possible to imagine some ways 
to over-fix this by router's software...

The application MUST:
- do not use DF bit;
OR
- do not use long (> 1024) packets at any cases, AND understand ICMP 
packets about MTU size and MTU discovery protocol.

Any application do not corresponding to this is niot garanteed to 
work in the Internet.


On Mon, 6 Jul 1998, Jens Schweikhardt wrote:

> Date: Mon, 6 Jul 1998 13:39:04 +0200 (MET DST)
> From: Jens Schweikhardt <[email protected]>
> To: [email protected], [email protected], [email protected]
> Cc: DFN NOC <[email protected]>
> Subject: MTU problems with GRE tunnels (fwd)
> 
> To whom it may concern,
> 
> here is some email I received in the last months followed by
> some of my observations which might be related to the problems
> discussed. I have posted my obervations to comp.sys.dcom.cisco
> and opened a trouble ticket with cisco's technical assistance center.
> 
> # Forwarded message:
> # > From merit.edu!errors-nohumans Fri Jun  5 23:44:49 1998
> # > Message-Id: <[email protected]>
> # > X-Sender: [email protected]
> # > X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.3 (32)
> # > Date: Fri, 05 Jun 1998 09:53:58 +0100
> # > To: [email protected]
> # > From: philip bridge <[email protected]>
> # > Subject: MTU problems with GRE tunnels
> # > Mime-Version: 1.0
> # > Content-Type: text/plain; charset="us-ascii"
> # > Sender: [email protected]
> # > Content-Length: 1881
> # > 
> # > I'm experiencing problems with fragmentation due to Cisco GRE tunnel
> # > overhead: the way I understand it, the MTU if a GRE tunnel will always be
> # > less than the MTU of the underlying IP cloud (in our case 1500 bytes) due
> # > to the IP encapsulation overhead. So 1500 byte packets attempting to
> # > traverse the tunnel will be fragmented, or dropped if the DF bit is set, in
> # > which case an ICMP message is send back to the originating host
> # > 
> # > We're trying to use GRE tunnels extensivly in some fancy added-value
> # > Internet services, and it seems that there is a small but significant
> # > amount of application traffic out there that has problems when traversing a
> # > GRE tunnel with MTU < 1500. We've seen two problems:
> # > 
> # > - 1500 byte packets with DF set. This is either application traffic, or MTU
> # > path discovery is broken, because the same packets get sent repeatedly
> # > - 1500 byte packets get fragmented, but the destination host cannot cope
> # > with the fragmentation (firewall issues?)
> # > 
> # > We see this on a variety of platforms (from 2500, 7507) and a variety of
> # > IOS releases (11.1(18)CC, 11.1(2), 11.2(5). Talking to another provider
> # > indicates that the same problem exists with other vendors, and is having
> # > the same severe impact.
> # > 
> # > Thinking about it, this is a problem is to be expected with IP tunnels of
> # > all types, but I am surprised at the extent it's influence on our
> # > customer's applications (such as large emails). I do not want to overstate
> # > the proportion of traffic we see with this problem - but it does seem to be
> # > enough to render GRE tunnels very problematic - to say the least. But I
> # > know lots of people are using GRE for this or similar applications...so
> # > what am I missing here.
> # > 
> # > thanks in advance for help/tips
> # > 
> # > Phil
> # > 
> # > 
> # > 
> # > ______________________________________________________________
> # > Philip Bridge	
> # > ++41 31 688 8262	[email protected]     www.ip-plus.ch
> # > PGP: DE78 06B7 ACDB CB56 CE88 6165 A73F B703
> # > 
> # 
> # 
> # -- 
> # Bernhard Kroenung, Bahnhofstr 8, 36157 Ebersburg/Rhoen, Germany +49 6656 910101
> # @work : [email protected]                              Work: +49 661 9011777
> # @home : [email protected]       @school : [email protected]
> # 
> 
> hello, world\n
> 
> Here's something very strange I observe with GRE tunnels (the default
> tunnel mode). It looks like cisco routers send IP datagrams violating RFC 791
> [Internet Protocol] over GRE tunnels. In particular, the length field of
> the IP header is computed incorrectly to *not* include the size of the
> IP header. RFC 791 says about the length field:
> 
> <quote>
> 
>   Total Length:  16 bits
> 
>     Total Length is the length of the datagram, measured in octets,
>     including internet header and data.  This field allows the length of
>     a datagram to be up to 65,535 octets.  ...
> 
> </quote>
> 
> I have an application on my workstation that serves as one endpoint
> of a GRE tunnel. In fact, it's such a tiny perl program that I have
> appended it at the end of this mail.
> 
> Here's the tunnel config on my cisco, which is a
> IOS (tm) 4500 Software (C4500-P-M), Version 11.2(9), RELEASE SOFTWARE (fc1):
> 
> interface Tunnel2
>  description GRE Test Tunnel
>  ip address 10.0.0.1 255.255.255.252
>  tunnel source 193.174.247.254           !another iface of this cisco
>  tunnel destination 193.174.247.193      !my workstation's address
>  tunnel key 42                           !optional
> 
> Let's ping the other end of the tunnel:
> io#ping 10.0.0.2
>  
> Type escape sequence to abort.
> Sending 5, 100-byte ICMP Echos to 10.0.0.2, timeout is 2 seconds:
> .....
> Success rate is 0 percent (0/5)
> 
> Here's what the perl tunnel endpoint outputs:
> Length of received packet: 128   <<<<<<<<< Note this
> version:      4
> header len:   5
> tos:          0
> length:       108                <<<<<<<<< Note this
> id:           1586
> flags:        0
> offset:       0
> ttl:          255
> protocol:     47
> chksum:       16895
> source:       193.174.247.254
> destination:  193.174.247.193
>  20 00 08 00 00 00 00 2a 45 00 00 64 01 39 00 00
>  ff 01 a6 5d 0a 00 00 01 0a 00 00 02 08 00 51 68
>  00 00 23 a5 00 00 00 01 9a 8b 6e b0 ab cd ab cd
>  ab cd ab cd ab cd ab cd ab cd ab cd ab cd ab cd
>  ab cd ab cd ab cd ab cd ab cd ab cd ab cd ab cd
>  ab cd ab cd ab cd ab cd ab cd ab cd ab cd ab cd
>  ab cd ab cd ab cd ab cd ab cd ab cd
> 
> Or let's try a telnet session:
> io#telnet 10.0.0.2
> Trying 10.0.0.2 ... 
> 
> Length of received packet: 72   <<<<<<<<< Note this
> version:      4
> header len:   5
> tos:          0
> length:       52                <<<<<<<<< Note this
> id:           1591
> flags:        0
> offset:       0
> ttl:          255
> protocol:     47
> chksum:       16946
> source:       193.174.247.254
> destination:  193.174.247.193
>  20 00 08 00 00 00 00 2a 45 00 00 2c 00 00 00 00
>  ff 06 a7 c9 0a 00 00 01 0a 00 00 02 52 02 00 17
>  52 c8 26 04 00 00 00 00 60 02 10 c0 a8 9a 00 00
>  02 04 05 98
> 
> We note that the length as reported in the IP header is
> always 20 octets less than what we receive on the socket.
> This leads me to the question
> 
>   Do you cisco guys read RFCs? :-)
> 
> Regards,
> 
> 	Jens Schweikhardt
> -- 
> ##   Network Operation Center,  DFN-Verein Gesch�ftsstelle Stuttgart   ##
> ## http://www.noc.dfn.de/ finger [email protected] [email protected] ##
> ##               >>>>>>  mailto:  [email protected]  <<<<<<               ##
> 
> 
> 
> Here's my perl script:
> 
> #!/usr/local/bin/perl5 -w
> #
> # GRE Tunnel Endpoint; prints all GRE packets received.
> #
> # Author: Jens Schweikhardt <[email protected]>
> #
> # >>> You probably need root permission to open the raw socket. <<<
> 
> use Socket qw (SOCK_RAW PF_INET);
> use strict;
> 
> my $gre = 47; # Generic Routing Encapsulation
> my $rbits;    # bitmask with read file descriptors for select
> my $out;      # writable copy of rbits for select to clobber
> my $nready;   # return value from select
> 
> unless (socket (SOCKET, &PF_INET(), &SOCK_RAW(), $gre)) {
>     print STDERR "gre socket: $!\n";
>     exit 1;
> }
> $rbits = ''; vec ($rbits, fileno SOCKET, 1) = 1;
> for (;;) {
>     $nready = select ($out = $rbits, undef, undef, undef);
>     last unless defined $nready; # Should not happen...
>     &receive_packet () if $nready; # A packet is waiting
> }
> close SOCKET;
> exit 0;
> 
> sub receive_packet {
>     my $from_msg = '';
>     my $from_saddr = recv (SOCKET, $from_msg, 1500, 0);
>     unless (defined $from_saddr) {
>         print STDERR "recv: $!\n";
>         return 0;
>     }
>     print "\nLength of received packet: ", length ($from_msg), "\n";
>     my ($delivery_ip_version,
>         $delivery_ip_ihl,
>         $delivery_ip_tos,
>         $delivery_ip_length,
>         $delivery_ip_id,
>         $delivery_ip_flags,
>         $delivery_ip_offset,
>         $delivery_ip_ttl,
>         $delivery_ip_proto,
>         $delivery_ip_chksum,
>         $delivery_ip_src,
>         $delivery_ip_dst,
>         $delivery_ip_options,
>         $delivery_ip_data
>     ) = &ip_unpack ($from_msg);
> 
> 	print "version:      $delivery_ip_version\n";
> 	print "header len:   $delivery_ip_ihl\n";
> 	print "tos:          $delivery_ip_tos\n";
> 	print "length:       $delivery_ip_length\n";
> 	print "id:           $delivery_ip_id\n";
> 	print "flags:        $delivery_ip_flags\n";
> 	print "offset:       $delivery_ip_offset\n";
> 	print "ttl:          $delivery_ip_ttl\n";
> 	print "protocol:     $delivery_ip_proto\n";
> 	print "chksum:       $delivery_ip_chksum\n";
>     printf "source:       %u.%u.%u.%u\n",
> 		unpack ('C4', pack ('L', $delivery_ip_src));
>     printf "destination:  %u.%u.%u.%u\n",
> 		unpack ('C4', pack ('L', $delivery_ip_dst));
> 	&dump ($delivery_ip_data);
> }
> 
> sub dump {
>     my $len = length ($_[0]);
>     if ($len > 0) {
>         my @octet = split //, $_[0];
>         my $i;
>         for ($i = 1; $i <= $len; ++$i) {
>             printf " %02x", unpack ('C', $octet[$i-1]);
>             print "\n" unless $i % 16;
>         }
> 		print "\n" if $i % 16;
>     } else {
>         print " [NO DATA]\n";
>     }
> }
> 
> # Format of an IP packet, RFC 791.
> #
> sub ip_unpack {
>     my $packet = shift;
>     if (length ($packet) < 20) {
>         print STDERR "ip packet too short: ", length ($packet), " bytes\n";
>         exit 1;
>     }
>     my (
>         $version,
>         $tos,
>         $length,
>         $id,
>         $flags,
>         $ttl,
>         $proto,
>         $chksum,
>         $src,
>         $dst
>     ) = unpack ('CCnnnCCnNN', $packet);
>     my $ihl = $version & 017;
>     $version >>= 4;
>     if ($version != 4) {
>         print STDERR "ip version mismatch, expected 4, got $version\n";
>         exit 1;
>     }
>     my $offset = $flags & 017777;
>     $flags >>=13;
>     my $options = substr ($packet, 20, $ihl * 4 - 20);
>     my $data = substr ($packet, $ihl * 4);
>     return (
>         $version,
>         $ihl,
>         $tos,
>         $length,
>         $id,
>         $flags,
>         $offset,
>         $ttl,
>         $proto,
>         $chksum,
>         $src,
>         $dst,
>         $options,
>         $data
>     );
> }
> 

Aleksei Roudnev, Network Operations Center, Relcom, Moscow
(+7 095) 194-19-95 (Network Operations Center Hot Line),(+7 095) 239-10-10, N 13729 (pager)
(+7 095) 196-72-12 (Support), (+7 095) 194-33-28 (Fax)