Print Page | Close Window

Blocking Rogue Web crawlers

Printed From: Web Wiz Forums
Category: Web Wiz Web App Support Forums
Forum Name: Web Wiz Forums
Forum Description: Support forum for Web Wiz Forums application.
URL: https://forums.webwiz.net/forum_posts.asp?TID=25821
Printed Date: 03 April 2026 at 11:29am
Software Version: Web Wiz Forums 12.08 - https://www.webwizforums.com


Topic: Blocking Rogue Web crawlers
Posted By: wasp
Subject: Blocking Rogue Web crawlers
Date Posted: 12 June 2008 at 8:24pm
Is there any way to block bandwidth hungry web robots from crawling the forum.
One in particular, Yandex which violates the robots.txt file is eating up loads of bandwidth.


-------------
Wasp



Replies:
Posted By: 123Simples
Date Posted: 12 June 2008 at 8:35pm
I assume that you have uploaded the robots.txt file as below:
User-agent: *
Disallow: /images/
Disallow: /includes/

User-agent: googlebot
Crawl-delay: 60

User-agent: yahoo
Crawl-delay: 60

User-agent: Slurp
Crawl-delay: 60

User-agent: msnbot
Crawl-delay: 60

User-agent: Teoma
Crawl-delay: 60

User-agent: aipbot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: psbot
Disallow: /

User-agent: fast
Disallow: /

One assumes therefore you could add Yandex just to get rid of them too, but the txt file has to be in the root directory of course. You seem to have a lot "going on" with website adverts, links etc and perhaps this is "drawing" in all and sundry too but I'm probably wrong


-------------
http://www.123simples.com/" rel="nofollow - Visit 123 Simples Web Design


Posted By: 123Simples
Date Posted: 12 June 2008 at 8:38pm
PS - You could also add:

Disallow: /forum/


-------------
http://www.123simples.com/" rel="nofollow - Visit 123 Simples Web Design


Posted By: wasp
Date Posted: 12 June 2008 at 8:51pm
Hi MrTWS,
Thanks for responding.
Below is what I have in the robots.txt.
I don't want to stop all robots as I want to get as many customers to the site as possible.
The other Robots don't take up much - just this Yandex one from Russia.
It doesn't appear to take any notice of the Robots.txt file (I've done a search and it seems that others have the same problem with it doing this).
Any help would be appreciated if there's something wrong with my file.
 
User-agent: Googlebot
Disallow:
User-agent: Googlebot-Image
Disallow:
User-agent: Mediapartners-Google
Disallow:
User-agent: Adsbot-Google
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: Teoma
Disallow:
User-agent: Gigabot
Disallow:
User-agent: Scrubby
Disallow:
User-agent: Robozilla
Disallow:
User-agent: Nutch
Disallow:
User-agent: ia_archiver
Disallow:
User-agent: baiduspider
Disallow:
User-agent: yahoo-mmcrawler
Disallow:
User-agent: psbot
Disallow:
User-agent: asterias
Disallow:
User-agent: yahoo-blogs/v3.9
Disallow:
User-agent: Yandex
Disallow: /
User-agent: *
Disallow: /


-------------
Wasp


Posted By: 123Simples
Date Posted: 12 June 2008 at 9:10pm
In another post Web Wiz Bruce wrote:
Also as well particular IP ranges are usually set for a country. So if the IP range is common between all or allot of them you could block that as well in the IP Blocking section.

The '81.177.xxx.xxx range seems to be a common Russian IP range so you could use the following to block users of that IP range:-

81.177.*

And that was to enter within the admin area of the forum itself. But looking at your txt file, I think its wrong. Hang on a sec


-------------
http://www.123simples.com/" rel="nofollow - Visit 123 Simples Web Design


Posted By: wasp
Date Posted: 13 June 2008 at 3:18pm
Hi MrTWS,
I have 77.88.22, 77.88.23 & 77.88.24 blocked but (I think) this only blocks registering and posting.


-------------
Wasp


Posted By: 123Simples
Date Posted: 13 June 2008 at 6:52pm
Did you check your txt.robots file?
I can only think that of a few possibilities

Firstly the Yandex IP address I believe is:
77.88.26.28

So you could add that as a blocked IP address
Next add to your txt file
User-agent: Yandex
Disallow: /

However, you may need to use .htaccess to block more forcefully and I have not had time to research that bit tonight

Good luck Thumbs%20Up


-------------
http://www.123simples.com/" rel="nofollow - Visit 123 Simples Web Design


Posted By: eimee
Date Posted: 19 June 2008 at 12:24pm
I think by creating the help of robots.txt you can splve your problem



Print Page | Close Window

Forum Software by Web Wiz Forums® version 12.08 - https://www.webwizforums.com
Copyright ©2001-2026 Web Wiz Ltd. - https://www.webwiz.net