North American Network Operators Group Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical Re: broke Inktomi floods?
On Thu, 20 Jan 2005 14:30:04 +0200, Gadi Evron <[email protected]> wrote: > > Inktomi (now Yahoo!) sends it's spiders all over the Internet. Lately > some of our systems are reporting that they open many HTTP connections > to our web sites, without ever sending any data and immediately > disconnecting. This is getting to a level where it disturbs us. > I have heard previous stories of inktomi ignoring robots.txt (not seen this for myself though). And there are threads like this - Quoting from http://www.webmasterworld.com/forum11/1968-1-15.htm > I've got Scooter allowed in, but I've also got it lumped int with a > number of agents that are not allowed to get non-HTML files. This is > especially important at my site as it includes a number of very large > binary datasets in numerous locations and the robots have proven too > stupid to understand that downloading them is a waste of bandwidth. > > RewriteCond %{HTTP_USER_AGENT} .*Ask.Jeeves.* [OR] > RewriteCond %{HTTP_USER_AGENT} .*FAST.WebCrawl.* [OR] > RewriteCond %{HTTP_USER_AGENT} .*ia_archiver.* [OR] > RewriteCond %{HTTP_USER_AGENT} .*InfoSeek.* [OR] > RewriteCond %{HTTP_USER_AGENT} .*inktomi.* [OR] > RewriteCond %{HTTP_USER_AGENT} .*Scooter.* [OR] > RewriteCond %{HTTP_USER_AGENT} .*Slurp.* [OR] > RewriteCond %{HTTP_USER_AGENT} .*Teoma.* [OR] > RewriteCond %{HTTP_USER_AGENT} .*VoilaBot.* [OR] > RewriteCond %{HTTP_USER_AGENT} .*Google.* > RewriteRule!.*(html�htm�txt�/)$ /www/msgs/badagent.html [F]
|