North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: STILL Paging Google...

  • From: Michael.Dillon
  • Date: Wed Nov 16 09:38:32 2005

[email protected] (Matthew Elvey) [Wed 16 Nov 2005, 01:56 CET]:
>Still no word from google, or indication that there's anything wrong 
>with the robots.txt.  Google's estimated hit count is going slightly up, 
>instead of way down.

Way back in the early '90's someone came up with an
elegant solution to this problem. When building a site
in a folder named /httproot, all dynamic pages, i.e.
scripts, were placed in a folder named /httproot/cgi-bin
Then somebody invented robots.txt to allow people to
tell spiders to leave the cgi-bin folder alone.

Sites which follow the ancient paradigm do not run
into these kinds of problems. Some people would say that
asking the world to re-engineer the robots.txt protocol
instead of building sites compliant with the protocol,
is in violation of the robustness principle as expressed
by Jon Postel in RFC 793 section 2.10 and reiterated in 
section 4.5 of RFC 3117.

When something doesn't work, the correct operational
response is to fix it.

--Michael Dillon