Web Wiz - Green Windows Web Hosting

  New Posts New Posts RSS Feed - How about allowing bots/spiders?
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

Topic ClosedHow about allowing bots/spiders?

 Post Reply Post Reply Page  12>
Poll Question: Would you like to allow spiders to index restricted areas?
Poll Choice Votes Poll Statistics
1 [25.00%]
1 [25.00%]
2 [50.00%]
0 [0.00%]
This topic is closed, no new votes accepted

Author
Bluefrog View Drop Down
Senior Member
Senior Member


Joined: 23 October 2002
Location: Korea, South
Status: Offline
Points: 1701
Direct Link To This Post Topic: How about allowing bots/spiders?
    Posted: 03 December 2003 at 10:29am

Since WWF is basically server-side, bots won't be able to see content on pages. What about a mod or option to allow bots (e.g. 'googlebot') to search those pages and index them? It could be googlebot or slurp or whatever. Just spiders...

I can contribute via code, spider names, and spider IP addresses. Google has at least 150 spider IPs that I know of.

Ideas? Comments?

 

Back to Top
WebWiz-Bruce View Drop Down
Admin Group
Admin Group
Avatar
Web Wiz Developer

Joined: 03 September 2001
Location: Bournemouth
Status: Offline
Points: 9844
Direct Link To This Post Posted: 03 December 2003 at 10:54am
You are wrong, Googlebot and other spiders see pages through HTTP so the page is run through the ASP.DLL so the spider sees the builtup page the same as you and I do through a browser.

If you do some searches on Google you will find that many of the pages in this forum are indexed by Google.

I found one just now, click the link below to see a page of this forum that has been indexed by Google, it should be the first in the returned results:-

http://www.google.com/search?q=WWF+hack&sourceid=mozilla -search&start=0&start=0&ie=utf-8&oe=utf-8


Edited by -boRg-
Back to Top
Bluefrog View Drop Down
Senior Member
Senior Member


Joined: 23 October 2002
Location: Korea, South
Status: Offline
Points: 1701
Direct Link To This Post Posted: 03 December 2003 at 7:47pm

Perhaps I wasn't clear enough.

I only put "restricted" in the question, not the title or body - I mean restricted pages. i.e. Pages that you must have permission to view.

It would make that available in search engine caches, which for some things may not be appropriate, but for a lot of purposes, it should be fine.

e.g. A members only forum is restricted, but we allow Google to peek in on it. Users from search engines would need to login prior to viewing the content though.

For security, this can be done for IP addresses so that users can't spoof their User Agent.

etc... etc...

 

Back to Top
ljamal View Drop Down
Mod Builder Group
Mod Builder Group


Joined: 16 April 2003
Status: Offline
Points: 888
Direct Link To This Post Posted: 03 December 2003 at 10:09pm
But Google uses caching so if the area was restricted and Google was allowed in, every one would be able to peek at the cached page at Google which totally defeats the purpose of having a restricted forum.
Back to Top
WebWiz-Bruce View Drop Down
Admin Group
Admin Group
Avatar
Web Wiz Developer

Joined: 03 September 2001
Location: Bournemouth
Status: Offline
Points: 9844
Direct Link To This Post Posted: 04 December 2003 at 2:49am
I agree, have a look at the cached link at the bottom of Googles returned searches it cahes the whole page, which is useful if the site is down or changed since google indexed it.

This would then mean that posts can be read in restricted areas very easily through Googles cache.

Also the other problem is security, if you are letting in spiders to restricted areas bassed on IP address and the header data that the spider uses, it would be very easy for a hacker to gain access to a restricted area of your forum by simply masking their IP address and changing the header of their browser to be indentical to that of a spider.


Back to Top
Bluefrog View Drop Down
Senior Member
Senior Member


Joined: 23 October 2002
Location: Korea, South
Status: Offline
Points: 1701
Direct Link To This Post Posted: 04 December 2003 at 7:46am

For the thing about restricted areas being visible - yes - that would be true.

For security problems, I don't agree.

Spoofing a User Agent can work one way (client to server) [ and the reverse is true but not worth discussion here ], and spoofing IP addresses can only work one way (client to server OR attacker to attackee).

The solution around spoofers is to first check the UA, then check the IP. If the IP does not match an approved one, then deny access to the site. Otherwise, allow. That way, only real bots / spiders are allowed.

Spoofing IP addresses is only good for DOS/DDOS/ similar type attacks. It cannot be used to gain access.

I.E. IP address verification is sufficiently secure.

This is NOT something you would want to do for some things, but it would provide an incentive for people to sign up at forums. A similar tactic is common place at some current major sites.

They allow search engine visitors, but disallow internal searches without sign-ups.

The whole thing is quite clever I think.

 

Back to Top
WebWiz-Bruce View Drop Down
Admin Group
Admin Group
Avatar
Web Wiz Developer

Joined: 03 September 2001
Location: Bournemouth
Status: Offline
Points: 9844
Direct Link To This Post Posted: 04 December 2003 at 8:18am
It's quite simple to make your browser look like one of googlebots spiders and make the IP a matching one for a googlebot.

I have a tool on my computer that can do it.

This would mean that if the system you mention is implemented I could go to any Web Wiz Forums and go straight into any restricted areas. Very simple to do.

IP addresses and user agents strings are completly insecure way to test if a user is permitted.
Back to Top
psycotik View Drop Down
Groupie
Groupie


Joined: 27 November 2003
Status: Offline
Points: 73
Direct Link To This Post Posted: 12 December 2003 at 3:03am
Also google has cache. Which means you can view webpages (ie.. restricted content) without having to even connect to the server.
Back to Top
 Post Reply Post Reply Page  12>

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.08
Copyright ©2001-2026 Web Wiz Ltd.


Become a Fan on Facebook Follow us on X Connect with us on LinkedIn Web Wiz Blogs
About Web Wiz | Contact Web Wiz | Terms & Conditions | Cookies | Privacy Notice

Web Wiz is the trading name of Web Wiz Ltd. Company registration No. 05977755. Registered in England and Wales.
Registered office: Web Wiz Ltd, Unit 18, The Glenmore Centre, Fancy Road, Poole, Dorset, BH12 4FB, UK.

Prices exclude VAT at 20% unless otherwise stated. VAT No. GB988999105 - $, € prices shown as a guideline only.

Copyright ©2001-2026 Web Wiz Ltd. All rights reserved.