North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: broke Inktomi floods?

  • From: Suresh Ramasubramanian
  • Date: Thu Jan 20 07:44:31 2005
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=uG4rxsFBoNZ7vQ/nT/Xte+4bayvobdjOzWN0pq0F01GwfdZRjkNsAcC/Dj5pw97qWsQzpfb24PKhaKU/dxnfjG+OYlJYajhDd//07se40KPqjgbck67TmprNV7/znpg2yVxr4PyDSTN8BuPH4HcbsC5iRoQ9051CwpPtLRirxXE=

On Thu, 20 Jan 2005 14:30:04 +0200, Gadi Evron <[email protected]> wrote:
> 
> Inktomi (now Yahoo!) sends it's spiders all over the Internet. Lately
> some of our systems are reporting that they open many HTTP connections
> to our web sites, without ever sending any data and immediately
> disconnecting. This is getting to a level where it disturbs us.
> 

I have heard previous stories of inktomi ignoring robots.txt (not seen
this for myself though).  And there are threads like this -

Quoting from http://www.webmasterworld.com/forum11/1968-1-15.htm

> I've got Scooter allowed in, but I've also got it lumped int with a
> number of agents that are not allowed to get non-HTML files. This is
> especially important at my site as it includes a number of very large
> binary datasets in numerous locations and the robots have proven too
> stupid to understand that downloading them is a waste of bandwidth.
> 
> RewriteCond %{HTTP_USER_AGENT} .*Ask.Jeeves.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*FAST.WebCrawl.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*ia_archiver.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*InfoSeek.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*inktomi.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*Scooter.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*Slurp.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*Teoma.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*VoilaBot.* [OR]
> RewriteCond %{HTTP_USER_AGENT} .*Google.*
> RewriteRule!.*(html�htm�txt�/)$ /www/msgs/badagent.html [F]