Print Page | Close Window

How about allowing bots/spiders?

Printed From: Web Wiz Forums
Category: Web Wiz Web App Support Forums
Forum Name: Web Wiz Forums Suggestions
Forum Description: Do you have any ideas for applications or content on Web Wiz? Then leave your suggestions here.
URL: https://forums.webwiz.net/forum_posts.asp?TID=7828
Printed Date: 28 March 2026 at 3:10pm
Software Version: Web Wiz Forums 12.08 - https://www.webwizforums.com


Topic: How about allowing bots/spiders?
Posted By: Bluefrog
Subject: How about allowing bots/spiders?
Date Posted: 03 December 2003 at 10:29am

Since WWF is basically server-side, bots won't be able to see content on pages. What about a mod or option to allow bots (e.g. 'googlebot') to search those pages and index them? It could be googlebot or slurp or whatever. Just spiders...

I can contribute via code, spider names, and spider IP addresses. Google has at least 150 spider IPs that I know of.

Ideas? Comments?

 



-------------
http://renegademinds.com/" rel="nofollow - Renegade Minds - Guitar Software http://renegademinds.com/Default.aspx?tabid=65" rel="nofollow - Slow Down Music



Replies:
Posted By: WebWiz-Bruce
Date Posted: 03 December 2003 at 10:54am
You are wrong, Googlebot and other spiders see pages through HTTP so the page is run through the ASP.DLL so the spider sees the builtup page the same as you and I do through a browser.

If you do some searches on Google you will find that many of the pages in this forum are indexed by Google.

I found one just now, click the link below to see a page of this forum that has been indexed by Google, it should be the first in the returned results:-

http://www.google.com/search?q=WWF+hack&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8 - http://www.google.com/search?q=WWF+hack&sourceid=mozilla -search&start=0&start=0&ie=utf-8&oe=utf-8


-------------
https://www.webwiz.net/web-wiz-forums/forum-hosting.htm" rel="nofollow - Web Wiz Forums Hosting
https://www.webwiz.net/web-hosting/windows-web-hosting.htm" rel="nofollow - ASP.NET Web Hosting


Posted By: Bluefrog
Date Posted: 03 December 2003 at 7:47pm

Perhaps I wasn't clear enough.

I only put "restricted" in the question, not the title or body - I mean restricted pages. i.e. Pages that you must have permission to view.

It would make that available in search engine caches, which for some things may not be appropriate, but for a lot of purposes, it should be fine.

e.g. A members only forum is restricted, but we allow Google to peek in on it. Users from search engines would need to login prior to viewing the content though.

For security, this can be done for IP addresses so that users can't spoof their User Agent.

etc... etc...

 



-------------
http://renegademinds.com/" rel="nofollow - Renegade Minds - Guitar Software http://renegademinds.com/Default.aspx?tabid=65" rel="nofollow - Slow Down Music


Posted By: ljamal
Date Posted: 03 December 2003 at 10:09pm
But Google uses caching so if the area was restricted and Google was allowed in, every one would be able to peek at the cached page at Google which totally defeats the purpose of having a restricted forum.

-------------
L. Jamal Walton

http://www.ljamal.com/" rel="nofollow - L. Jamal Inc : Web/ Print Design and ASP Programming


Posted By: WebWiz-Bruce
Date Posted: 04 December 2003 at 2:49am
I agree, have a look at the cached link at the bottom of Googles returned searches it cahes the whole page, which is useful if the site is down or changed since google indexed it.

This would then mean that posts can be read in restricted areas very easily through Googles cache.

Also the other problem is security, if you are letting in spiders to restricted areas bassed on IP address and the header data that the spider uses, it would be very easy for a hacker to gain access to a restricted area of your forum by simply masking their IP address and changing the header of their browser to be indentical to that of a spider.




-------------
https://www.webwiz.net/web-wiz-forums/forum-hosting.htm" rel="nofollow - Web Wiz Forums Hosting
https://www.webwiz.net/web-hosting/windows-web-hosting.htm" rel="nofollow - ASP.NET Web Hosting


Posted By: Bluefrog
Date Posted: 04 December 2003 at 7:46am

For the thing about restricted areas being visible - yes - that would be true.

For security problems, I don't agree.

Spoofing a User Agent can work one way (client to server) [ and the reverse is true but not worth discussion here ], and spoofing IP addresses can only work one way (client to server OR attacker to attackee).

The solution around spoofers is to first check the UA, then check the IP. If the IP does not match an approved one, then deny access to the site. Otherwise, allow. That way, only real bots / spiders are allowed.

Spoofing IP addresses is only good for DOS/DDOS/ similar type attacks. It cannot be used to gain access.

I.E. IP address verification is sufficiently secure.

This is NOT something you would want to do for some things, but it would provide an incentive for people to sign up at forums. A similar tactic is common place at some current major sites.

They allow search engine visitors, but disallow internal searches without sign-ups.

The whole thing is quite clever I think.

 



-------------
http://renegademinds.com/" rel="nofollow - Renegade Minds - Guitar Software http://renegademinds.com/Default.aspx?tabid=65" rel="nofollow - Slow Down Music


Posted By: WebWiz-Bruce
Date Posted: 04 December 2003 at 8:18am
It's quite simple to make your browser look like one of googlebots spiders and make the IP a matching one for a googlebot.

I have a tool on my computer that can do it.

This would mean that if the system you mention is implemented I could go to any Web Wiz Forums and go straight into any restricted areas. Very simple to do.

IP addresses and user agents strings are completly insecure way to test if a user is permitted.


-------------
https://www.webwiz.net/web-wiz-forums/forum-hosting.htm" rel="nofollow - Web Wiz Forums Hosting
https://www.webwiz.net/web-hosting/windows-web-hosting.htm" rel="nofollow - ASP.NET Web Hosting


Posted By: psycotik
Date Posted: 12 December 2003 at 3:03am
Also google has cache. Which means you can view webpages (ie.. restricted content) without having to even connect to the server.


Posted By: Bluefrog
Date Posted: 14 December 2003 at 2:58am

Often the point of a "restricted" area is not so much to restrict it, but to "dangle a carrot". In this case, you can allow Google to index it, and when users click the Google link, they have to "register". The part about the cache isn't really that important for this situation because the users can only view the cached pages and can't actually "view the site". They still need to register to view the rest.

I still don't see how spoofing IPs and User Agents matters, because once you spoof the IP, the only thing you can do is a DOS or DDOS attack, which is such an immature thing to do, not to mention a waste of time. (User Agent spoofing means nothing for security - i.e. Basing trust on a User Agent is foolhardy for security purposes - it is merely "informational".)

Let's face it, for 99% of web stuff, we really don't need all that much security. Look at the level of security built into WWF - there's quite a bit, but really, how many people actually need that level of security?

 



-------------
http://renegademinds.com/" rel="nofollow - Renegade Minds - Guitar Software http://renegademinds.com/Default.aspx?tabid=65" rel="nofollow - Slow Down Music


Posted By: ljamal
Date Posted: 14 December 2003 at 7:46am
I have 2 restricted areas.
1 is solely for moderators, so that we can discuss issues regarding the message board such as user bans and other stuff that NO ONE but moderators and admins should see.

The other one is for XML development and even some mods don't have access to that area as it is only for the XML developers.

In any of those cases if a spider had access to those forums, it would defeat the purpose of restricting them completely. What it sounds like is you want to dangle a carrot in front of non-members by having a restricted area that they can't visit, but one that they benefits from search engine registration. No web board comes with that feature, so if you want it, you will have to leartn WWF (or some other board) well enough to build that mod, pay some one to build the mod for you, or wait until someone develops it for themselves and releashes it as a mod. Personally, I don't think any one would ever release it as a mod there is too much work and done incorrectly would add a security hole to WWF. Even done correctly it could add a security hole to WWF.

If you believe the WWF has "too much security for 99% of web stuff", then you have done NOTHING that needed security. Security is the one of the largest issues on the web and if you can give your users a sense of security, you have gone a long way towards insuring repeat visitors.

-------------
L. Jamal Walton

http://www.ljamal.com/" rel="nofollow - L. Jamal Inc : Web/ Print Design and ASP Programming


Posted By: eksimba
Date Posted: 15 December 2003 at 11:21am

I, too, fail to see the point in having a 'restricted' area into which a public search engine can see. It wouldn't be restricted anymore, by definition.

Why don't you just dangle your 'carrots' in a non-restricted area of your forum, open to search bots, and put the juicy stuff into a restricted portion of your forum available only to registered users? That would seem to meet your needs.



-------------
- eric


Posted By: Bluefrog
Date Posted: 29 December 2003 at 11:51am

I'm having a terribly hard time coming to grips with the absolute inability of anyone to understand the basic concept. This isn't that tough to grasp...

There are times when security is important, and other times when security is not. When it is NOT critical, then why not do it?

And as for posting in other places in the forum, that doesn't work because the content that people are looking for will not be indexed by the bot.

And yes. I have done a lot of security work. I've done B2C webstores and B2B sites as well. And I have NEVER had any problems.

You're all thinking about this in a far to "programmer" like manner. 99.99% of surfers cannot even imagine that when they click the search engine link and get a "Please register" notice, that they can get the actual results from the search engine cache. You're overestimating the general internet population. Most people don't know what the "Home" key is for, much less what a cache is.

The point is, that in order to make yourself visible on the web for people to find out who you are, they have to find out through "indirect means", i.e. search engines, word of mouth, etc. Search engines are by far the most effective way to "advertise" yourself.

A general forum with content restricted to registered members only is not uncommon. There are lots like it. The reason being that they want you to sign up and participate. Once you are signed up, you are far more likely to post. It is generally bad practise to allow anonymous posting because that attracts spammers and other lowlifes.

As for security, it would not add any kind of a hole at all. [ See above. ] It should ONLY be added to portions of a forum where you want people to register. e.g. Imagine a "car audio" forum about subwoofers and all the latest gadgets. How many car audio buffs do you think are accomplished programmers? A lot less than the number of people who visit WWF I would imagine.

As for IP based security... it works. Spoofing is only useful for DOS or DDOS. I'm a whitehat... I'm not interested in blackhat stuff, although I am confident I could. (I've hacked sites by 'accident' on more than one occasion - never doing any damage of course... purely curiousity.)

I'm going to go beat my head against the wall for a bit...

But before I do, let me just state that season 6, "Once More with Feeling" is the absolute best Buffy episode ever  

Originally posted by Spike Spike wrote:


First he'll kill her
Then I'll save her
...
No, I'll save her
Then I'll kill her!

Originally posted by Willow Willow wrote:


I think this line's mostly filler

 

 



-------------
http://renegademinds.com/" rel="nofollow - Renegade Minds - Guitar Software http://renegademinds.com/Default.aspx?tabid=65" rel="nofollow - Slow Down Music



Print Page | Close Window

Forum Software by Web Wiz Forums® version 12.08 - https://www.webwizforums.com
Copyright ©2001-2026 Web Wiz Ltd. - https://www.webwiz.net