Web Wiz - Green Windows Web Hosting

  New Posts New Posts RSS Feed - Blocking Rogue Web crawlers
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

Blocking Rogue Web crawlers

 Post Reply Post Reply
Author
wasp View Drop Down
Newbie
Newbie


Joined: 05 August 2002
Location: United Kingdom
Status: Offline
Points: 22
Post Options Post Options   Thanks (0) Thanks(0)   Quote wasp Quote  Post ReplyReply Direct Link To This Post Topic: Blocking Rogue Web crawlers
    Posted: 12 June 2008 at 8:24pm
Is there any way to block bandwidth hungry web robots from crawling the forum.
One in particular, Yandex which violates the robots.txt file is eating up loads of bandwidth.
Back to Top
123Simples View Drop Down
Senior Member
Senior Member
Avatar

Joined: 08 July 2007
Location: United Kingdom
Status: Offline
Points: 1192
Post Options Post Options   Thanks (0) Thanks(0)   Quote 123Simples Quote  Post ReplyReply Direct Link To This Post Posted: 12 June 2008 at 8:35pm
I assume that you have uploaded the robots.txt file as below:
User-agent: *
Disallow: /images/
Disallow: /includes/

User-agent: googlebot
Crawl-delay: 60

User-agent: yahoo
Crawl-delay: 60

User-agent: Slurp
Crawl-delay: 60

User-agent: msnbot
Crawl-delay: 60

User-agent: Teoma
Crawl-delay: 60

User-agent: aipbot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: psbot
Disallow: /

User-agent: fast
Disallow: /

One assumes therefore you could add Yandex just to get rid of them too, but the txt file has to be in the root directory of course. You seem to have a lot "going on" with website adverts, links etc and perhaps this is "drawing" in all and sundry too but I'm probably wrong
Back to Top
123Simples View Drop Down
Senior Member
Senior Member
Avatar

Joined: 08 July 2007
Location: United Kingdom
Status: Offline
Points: 1192
Post Options Post Options   Thanks (0) Thanks(0)   Quote 123Simples Quote  Post ReplyReply Direct Link To This Post Posted: 12 June 2008 at 8:38pm
PS - You could also add:

Disallow: /forum/
Back to Top
wasp View Drop Down
Newbie
Newbie


Joined: 05 August 2002
Location: United Kingdom
Status: Offline
Points: 22
Post Options Post Options   Thanks (0) Thanks(0)   Quote wasp Quote  Post ReplyReply Direct Link To This Post Posted: 12 June 2008 at 8:51pm
Hi MrTWS,
Thanks for responding.
Below is what I have in the robots.txt.
I don't want to stop all robots as I want to get as many customers to the site as possible.
The other Robots don't take up much - just this Yandex one from Russia.
It doesn't appear to take any notice of the Robots.txt file (I've done a search and it seems that others have the same problem with it doing this).
Any help would be appreciated if there's something wrong with my file.
 
User-agent: Googlebot
Disallow:
User-agent: Googlebot-Image
Disallow:
User-agent: Mediapartners-Google
Disallow:
User-agent: Adsbot-Google
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: Teoma
Disallow:
User-agent: Gigabot
Disallow:
User-agent: Scrubby
Disallow:
User-agent: Robozilla
Disallow:
User-agent: Nutch
Disallow:
User-agent: ia_archiver
Disallow:
User-agent: baiduspider
Disallow:
User-agent: yahoo-mmcrawler
Disallow:
User-agent: psbot
Disallow:
User-agent: asterias
Disallow:
User-agent: yahoo-blogs/v3.9
Disallow:
User-agent: Yandex
Disallow: /
User-agent: *
Disallow: /
Back to Top
123Simples View Drop Down
Senior Member
Senior Member
Avatar

Joined: 08 July 2007
Location: United Kingdom
Status: Offline
Points: 1192
Post Options Post Options   Thanks (0) Thanks(0)   Quote 123Simples Quote  Post ReplyReply Direct Link To This Post Posted: 12 June 2008 at 9:10pm
In another post Web Wiz Bruce wrote:
Also as well particular IP ranges are usually set for a country. So if the IP range is common between all or allot of them you could block that as well in the IP Blocking section.

The '81.177.xxx.xxx range seems to be a common Russian IP range so you could use the following to block users of that IP range:-

81.177.*

And that was to enter within the admin area of the forum itself. But looking at your txt file, I think its wrong. Hang on a sec
Back to Top
wasp View Drop Down
Newbie
Newbie


Joined: 05 August 2002
Location: United Kingdom
Status: Offline
Points: 22
Post Options Post Options   Thanks (0) Thanks(0)   Quote wasp Quote  Post ReplyReply Direct Link To This Post Posted: 13 June 2008 at 3:18pm
Hi MrTWS,
I have 77.88.22, 77.88.23 & 77.88.24 blocked but (I think) this only blocks registering and posting.
Back to Top
123Simples View Drop Down
Senior Member
Senior Member
Avatar

Joined: 08 July 2007
Location: United Kingdom
Status: Offline
Points: 1192
Post Options Post Options   Thanks (0) Thanks(0)   Quote 123Simples Quote  Post ReplyReply Direct Link To This Post Posted: 13 June 2008 at 6:52pm
Did you check your txt.robots file?
I can only think that of a few possibilities

Firstly the Yandex IP address I believe is:
77.88.26.28

So you could add that as a blocked IP address
Next add to your txt file
User-agent: Yandex
Disallow: /

However, you may need to use .htaccess to block more forcefully and I have not had time to research that bit tonight

Good luck Thumbs%20Up
Back to Top
eimee View Drop Down
Newbie
Newbie


Joined: 06 September 2007
Status: Offline
Points: 2
Post Options Post Options   Thanks (0) Thanks(0)   Quote eimee Quote  Post ReplyReply Direct Link To This Post Posted: 19 June 2008 at 12:24pm
I think by creating the help of robots.txt you can splve your problem
Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.08
Copyright ©2001-2026 Web Wiz Ltd.


Become a Fan on Facebook Follow us on X Connect with us on LinkedIn Web Wiz Blogs
About Web Wiz | Contact Web Wiz | Terms & Conditions | Cookies | Privacy Notice

Web Wiz is the trading name of Web Wiz Ltd. Company registration No. 05977755. Registered in England and Wales.
Registered office: Web Wiz Ltd, Unit 18, The Glenmore Centre, Fancy Road, Poole, Dorset, BH12 4FB, UK.

Prices exclude VAT at 20% unless otherwise stated. VAT No. GB988999105 - $, € prices shown as a guideline only.

Copyright ©2001-2026 Web Wiz Ltd. All rights reserved.