Web Wiz - Green Windows Web Hosting - Celebrating 25 Years!

  New Posts New Posts RSS Feed - Robots.txt file project
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

Robots.txt file project

 Post Reply Post Reply Page  12>
Author
radic View Drop Down
Newbie
Newbie


Joined: 24 February 2005
Status: Offline
Points: 7
Post Options Post Options   Thanks (0) Thanks(0)   Quote radic Quote  Post ReplyReply Direct Link To This Post Topic: Robots.txt file project
    Posted: 02 July 2005 at 8:59pm
Hi,
 
I would like to add all of the files from the forum except for the pages like default.asp, forum_topics.asp & forum_posts.asp etc into my site robots.txt file. Instead of doing this from scratch I would like to see if anyone else has one already completed and could share with the community.
 
If you want googlebot etc to index more files and more often then this is vital.
 
If you want traffic and high SEO listings then this is vital.
Back to Top
Duval View Drop Down
Newbie
Newbie


Joined: 02 December 2004
Status: Offline
Points: 22
Post Options Post Options   Thanks (0) Thanks(0)   Quote Duval Quote  Post ReplyReply Direct Link To This Post Posted: 03 July 2005 at 4:04pm
Radic, what makes you think that by excluding files you'll get spidered more thoroughly? Generally the robots.txt exclusion is for non public areas of your site or with pages that you have concerns about duplicate content.

The single best thing that one can do with a forum to improve search engine spidering is to rewrite the url's. http://www.isapirewrite.com/

Here's a link to the exclusion protocol http://www.robotstxt.org/wc/norobots.html

Please repost any specifics if you run into difficulty.
Back to Top
wistex View Drop Down
Mod Builder Group
Mod Builder Group


Joined: 30 August 2003
Location: United States
Status: Offline
Points: 877
Post Options Post Options   Thanks (0) Thanks(0)   Quote wistex Quote  Post ReplyReply Direct Link To This Post Posted: 03 July 2005 at 8:31pm
BTW, if you press the space bar after typing a URL, it makes it clickable.  Here's a clickable version of the links provided above: http://www.isapirewrite.com/ and http://www.robotstxt.org/wc/norobots.html
 
The key to getting your site indexed is content content content.  Yes, there are tricks to get Google and other search engines to understand your site better and therefore rank it higher, but content still comes first.  On one of my sites, we have not used any SEO tricks, and yet we rank it the top 10 on Google and Yahoo! and MSN in some of the appropriate categories.  Actually, we do the opposite of what some SEO people say, yet we rank in the top 10.  Why?  The content is good and people keep coming back, and search engines have ways to figure this out.
 
And excluding files won't help you get indexed, actually it will probably hurt you.  The larger your site, the more of an authority you are, and that effects your ranking.  So excluding files would probably make you less of an authority since your site will appear smaller to search engines.
Back to Top
radic View Drop Down
Newbie
Newbie


Joined: 24 February 2005
Status: Offline
Points: 7
Post Options Post Options   Thanks (0) Thanks(0)   Quote radic Quote  Post ReplyReply Direct Link To This Post Posted: 04 July 2005 at 9:45am
Ok thanks for the feedback, I'm quite surprised that you both think its better to not use the Disallow command for files that you dont want shown on google etc.
 
I'm well aware of content is King etc and have been studying SEO quite a bit but there is so much to know. I mean what would be the point of having a file that has no content like an included header or one of the forum files that has no use or could produce an error if landed on?
 
I would much rather have the robot come in and index all the forum posts and boards than letting the robot index all these incuded or funtion files, I think you need to make it easy & clear for them for what they should do with your site and not throw hurdles in their path. 

Its also a waste of bandwith although thats not the point. The robots will turn away and not index a site properly if they have to deal with too much junk, they just want the content files, not the includes, not the errors caused be landing on a include file etc. You also dont want these files for everyone to see on google etc, you want users to find the forum posts where the keywords are.
 
Anyway thats just how I see it but very interested to hear more.
 
 
Back to Top
dpyers View Drop Down
Senior Member
Senior Member


Joined: 12 May 2003
Status: Offline
Points: 3937
Post Options Post Options   Thanks (0) Thanks(0)   Quote dpyers Quote  Post ReplyReply Direct Link To This Post Posted: 04 July 2005 at 12:25pm
The SE bots follow links. They don't actually "walk" a directory tree. You don't link to an include file so the bots never see it.
The first time a bot visits, it does a "shallow" scan - usually only 1 or two levels deep from your home page. On subsequent visits, it scans deeper and deeper. It can take months to fully scan a deep site.

Pages that change frequently are visited more often by the bots. Once they get past your main forum page, they'll see pages that change frewuently and keep coming back more often. I've seen forums that get scanned twice a day.

One way of getting your site fully indexed wuicker is to include a site map on your front page. Another way to get a forum site indexed is to include a link to active topics on the home page.

The robots.txt exclude functions are used to prevent the bots from following links to those directories - not to prevent walking the directories which is something bots can't do if you've turned off directory views for your site.

One of the useful excluding directories and files from robots.txt does is to keep hidden things that you want hidded from the se's - like keeping your images out of google.

Note that robots.txt is only useful for conforming search engines. Bad Bots either ignore it, or use it to identify areas where "good stuff" might be kept.

Lead me not into temptation... I know the short cut, follow me.
Back to Top
wistex View Drop Down
Mod Builder Group
Mod Builder Group


Joined: 30 August 2003
Location: United States
Status: Offline
Points: 877
Post Options Post Options   Thanks (0) Thanks(0)   Quote wistex Quote  Post ReplyReply Direct Link To This Post Posted: 04 July 2005 at 7:35pm
dpyers is right, the search engines only follow and index links.  They don't index any file that noone links directly to.  I've never had a problem with Google or any of the others linking to any header or footer files.
 
Remember, the bots can't read your directory structure, they can only follow hyperlinks in a webpage.
Back to Top
dpyers View Drop Down
Senior Member
Senior Member


Joined: 12 May 2003
Status: Offline
Points: 3937
Post Options Post Options   Thanks (0) Thanks(0)   Quote dpyers Quote  Post ReplyReply Direct Link To This Post Posted: 05 July 2005 at 9:05pm
Something new from google that you may be interested in
https://www.google.com/webmasters/sitemaps/login

Lead me not into temptation... I know the short cut, follow me.
Back to Top
radic View Drop Down
Newbie
Newbie


Joined: 24 February 2005
Status: Offline
Points: 7
Post Options Post Options   Thanks (0) Thanks(0)   Quote radic Quote  Post ReplyReply Direct Link To This Post Posted: 06 July 2005 at 8:58am
Originally posted by dpyers dpyers wrote:

Something new from google that you may be interested in
https://www.google.com/webmasters/sitemaps/login
 
 
hahaha, ive been busy working on these for my sites for the last few days since I herd of this. So the point of this post is not really an issue anymore if this Google Sitemap thing works. I got a snitz forum Site Map indexed yesterday and im now working on ones for the rest of my sites... Wink 
 
dpyers, ok I see what your saying now and thanks for that information and that was explained well.
Back to Top
 Post Reply Post Reply Page  12>

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.08
Copyright ©2001-2026 Web Wiz Ltd.


Become a Fan on Facebook Follow us on X Connect with us on LinkedIn Web Wiz Blogs
About Web Wiz | Contact Web Wiz | Terms & Conditions | Cookies | Privacy Notice

Web Wiz is the trading name of Web Wiz Ltd. Company registration No. 05977755. Registered in England and Wales.
Registered office: Web Wiz Ltd, Unit 18, The Glenmore Centre, Fancy Road, Poole, Dorset, BH12 4FB, UK.

Prices exclude VAT at 20% unless otherwise stated. VAT No. GB988999105 - $, € prices shown as a guideline only.

Copyright ©2001-2026 Web Wiz Ltd. All rights reserved.