Forum Home > General Discussion > Web Design Discussion

RSS Feed - Search Box needed for own sight

FAQ

Search Box needed for own sight

Post Reply

Page <1 234 >

Author

Message

Topic Search

Topic Options

ray12

Newbie

Joined: 31 July 2003
Status: Offline
Points: 11

Posted: 27 August 2003 at 8:33pm

Croco Dylan wrote:

Found a java scripted search box (tipue) which uses a datafile. Just enter all info, description and keywords and it workx. So, no need to download all pages and no need for any script or program to "read" all original html files.

Actually, the data file is still being downloaded. When i say "download" i don't mean the user has to click on a file and actually download and Save the file manually. What i mean is that the file must be transferred from the server to the client - like when you place an image on your page - the browser will have to still download the image file before it can be displayed on your page. Same thing here, when you have a <script src="data.js"> on the top of your page, the file must be downloaded before the page can be rendered.

Now this is only a problem if the data file is big, but in this case, it could be fast because its a small file: since you said you manually enter the datafile, i assume there really isn't that much data in there. I guess this also means it relies on your manually entered keywords? So it doesn't actually use the words on the page, but just the keywords you type in.. this means there are plenty of words not included in the search, so its quite limited. But again, if your site is like 6-10 pages, you won't really notice until it grows to 30 or more pages...

Croco Dylan wrote:

Probably I'll see if I can make a simple perl script or use a kourne shell to read all the htmll files, put meta tags in certain fields and arrays and let my (perl)script update or change the contents of the datafile.

That's a good option, you can even get the perl script to rip unique words out from the html files for your keywords.

The Zoom thing i pointed at before, has a similar JavaScript search - which requires no server-side stuff. And in addition to that, has a Windows program which creates the data file for you by reading your html pages, and actually using the words it finds on the pages.

Croco Dylan View Drop Down

Members Profile

Find Members Posts

Newbie

Joined: 20 August 2003
Location: Netherlands
Status: Offline
Points: 34

Posted: 28 August 2003 at 4:40am

ray12 wrote:

Actually, the data file is still being downloaded.

As to the size of the datafile, here's what the author's got to say on this

In our experience, the details for five pages tends to take up approximately 1Kb. For example, a site with 75 indexed pages would require a 15Kb data file.

Of course only time will tell how this will turn out to be. I do plan to use a lot more keywords. Not sure bout my keyword policy and may use more internal keywords than 'just' the tags (for bots to pick up). Examples? A search within (future) restricted sections of the sight. These pages are not meant to be found through Google or such, so there's no need for the keywords here. Internally I would wanna use them. I could use one txt file where lines could read like this

someurl;tipue;keyword1, keyword2, ...
sameurl;bots;keyword1, keyword2, ....

A simple perl or ksh script could take this file and update both my datafile and the html-files itself. Then see which files have been modified and upload these. Of course I could make this script as smart as needed, or use several scripts in conjunction with eachother. Some examples, extract all text on pages which is printed in bold or contrasting font. Extract unique or duplicate words, etcetera etcetera. Output can be to any one or more files. Let's say generate a list of keywords used, and the number of occurences. Perhaps this could even be done in conjunction with user feedback without having to resort to server-side scripting. Count and identify keywords as used by visitors and append info to file on server or by mail. One can think of many things up front, which policies and rules to enforce in optimizing the sight and how to evaluate certain data can be better understood (and used) after a certain amount of time has passed.

ray12 wrote:

But again, if your site is like 6-10 pages, you won't really notice until it grows to 30 or more pages...

As to size of the sight? Hmm ... hard to say. Plan to start in 1 'n 1/2 month with at least 40 or so stories, some short some long. I call them stories and not pages, since stories may be spread along several pages (small files fast loading). Perhaps with some a 'give your comments page' or poll for users (rate this page) and/or a source page (info found in these publ.'s). Although maybe not 2 full pages, still some space needed. Let's say an average of 2 or 3 pages per story. 3 pages times 40 stories makes at most 120 lines in the datafile, which according to tipue would result in a datafile of 24k. Which isn't that bad. Even if in the first 6 to 12 months I would be able to double the content (40 to 80 stories), that would still be acceptable.

If the need arose for faster searching, I could always split up the sight in several sections and have more than one search box, one for each section with its own datafile and a master search box for top-level search level. The separate datafiles could still be filled automatically, based on i.e. path (directory is a section). Although I'm not particularly fond of such a solution, it still might be used as a fallback untill you come up with something better.

In time, I'll have to upgrade to some form of server-side scripting or even running (i.e.) Apache on a pc here at home, 'specially considering all the (long-term) wishes I have. No question about this whatsoever! For now most important is to come up with enough (interesting) content to make the sight worthwile, and to keep adding enough to get users to come back every once in a while. This is the absolute prio number one. As long as this goal hasn't been reached, I think it would be a waste of time to put much effort into creating the technical best solutions. No need for a search box or site map if you only got two pages.

So what will do for now, or at least untill I resort to more advanced techniques. A basic structure and logic which can also be used with more advanced stuff, such as server-side scripting. Enough flexibility to upgrade, and for now as much dynamic and generic as possible.

When you come to think of it, it's all just pretty bsic ground rules. Although we sometimes tend to forget these, meaning we have to reinvent the wheel every time we want something new. Just a waste of time, ain't it?

ray12

Members Profile

Find Members Posts

Newbie

Joined: 31 July 2003
Status: Offline
Points: 11

Posted: 29 August 2003 at 1:41am

Croco Dylan wrote:

As to the size of the datafile, here's what the author's got to say on this

In our experience, the details for five pages tends to take up approximately 1Kb. For example, a site with 75 indexed pages would require a 15Kb data file.

Yeah that's mostly true for that search script, because it only contains an index of words you manually enter (assuming you don't feed it 70,000 words with your perl scripts). I was describing something more akin to a real search engine, where all of the text on a page is indexed - so any word that appears on a page is searchable.

If you are happy with limited searching (matching only certain words on a page, or just words in the meta keywords tag), then by all means, it won't be much of an issue at all and you should go with it.

Croco Dylan wrote:

Absolutely true. Too many webmasters forget that and the mass of garbage on the Internet is the result of it. The most important thing is to have material that people want. Usability's important, but content is everything.

Although it's a good thing that you've sorta prepared and looked into this sorta thing at an early stage too y'know. Because when, in a few years time, and your site has grown to 500 pages, and you want to add a search function in... and realise that you have to go through 500 pages and give them all META descriptions and META keywords... and specify proper titles for those pages loading in frames and whatever else... it'd be a nightmare. So it's all good really :)

Croco Dylan wrote:

There's alot of reinventing of the wheel out there definitely. But at the same time, everyone has slightly different requirements and sometimes DIY can be good too. Most of the time, if you look around the web long enough, you'll find something that does what you need.

Croco Dylan View Drop Down

Members Profile

Find Members Posts

Newbie

Joined: 20 August 2003
Location: Netherlands
Status: Offline
Points: 34

Posted: 29 August 2003 at 3:47am

Hooray, a kindred spirit I seest beforst me

ray12 wrote:

Meant that for your own wheels. As in when first desiging and building a sight (or something else for that matter). Wishes 1 and 2 are must-have's and 3's pretty high on your list. 1 and 2 can be fulfilled by using some trips or ticks you've found. When. let's say a whole lotta later you also wanna have wish number 3 (or 4, 5,6,7), you (have to) decide on something which isn't compatible with your already implemented code for wish 1 and 2. You have to come up with somet'ing new for 1 and 2. Not so smart.

If you would have anticipated things better, it might have taken you more effort and time at the start but you would've found a generic solution for wishes 1 and 2. On one hand you don't want to spend too much time & effort into (possibly never needed) wishes, especially where money (or time) is a big issue. On the other hand you want to anticipate future wishes & developments the best you can, especially where this may give you a headstart on competitors.

As to

ray12 wrote:

where all of the text on a page is indexed - so any word that appears on a page is searchable.

I tend to use off-topic (key)words or double meanings just to be able to link to some other page. Topic-specific words can be found on pages that got's nothing to do with that particular topic. These words are only used so I can link back to other pages. Say the term "seal of approval" which could easily be on a disclaimer page. I would however use this to link to a picture of a seal (the animal) and from there link that to let's say greenpeace, or to a story bout arctic explorations, or to a page on seals & croft or ... or ...

The purpose being, of course, to surprise visitors by the unusual linking. Whether it's effective or even any fun this way? Time will tell.

Back to the search box. A search on all text within pages for a topic-specific word, might also produce a lot of pages that's got absolutely nothing to do with that topic. They're just there for 'surprise linking'. A search box should produce clean results, which wouldn't be the case on my sight. Not if I searched on all text.

As to the use of scripts. Very useful reports can be generated, which text is where and how many times. Because of my unusual linking, I'll still have to judge which keywords, titles and descriptions really apply to the page.

ray12

Members Profile

Find Members Posts

Newbie

Joined: 31 July 2003
Status: Offline
Points: 11

Posted: 31 August 2003 at 10:29pm

Croco Dylan wrote:

Ah yes. Absolutely. Planning ahead and leaving room for future modifications is important... but also be careful of going too far, leaving things too general or wide-open, when a more focused solution could have been better.

Like alot of forums don't bother working out how many new messages have been posted to each thread since the last time you visited. It's a bit excessive to check it for every thread (and remembering the previous number for each thread), and some would be smart enough to only check it for the top 5 thread or something. But i've always found it to be one of the most useful things to have, and having it partially implemented, can sometimes be very valuable, rather than waiting out for the "complete solution" and leaving room for it. Hmm.. perhaps I'm making a different point now. :)

Croco Dylan wrote:

I see. That does throw a spin on things. Perhaps having both is still an option, with priority placed on the keywords defined manually? Most scripts and search engines cater for both the words on a page and the manually entered description and keywords.

The biggest problem with using only self-defined words is that you'd have to guess all the possible words that your visitors will type in. Usability wise, if alot of users try a couple of searches, get zero results (most web surfers are trained to reduce the "specific-ness" of their searches in order to get more results - and they expect it), they usually move on and assume its broken. I know I do.

Using words off the page will usually find some results, of some relevance, whether it is of completely accurate relevance is another issue. But perhaps someone did want to find that link to greenpeace that they only remember was linked via a cute seal picture. They'd expect to be able to find it by searching for the seal. Not your priority of concern perhaps, but it's something to think about.

Croco Dylan wrote:

There is still a level of convenience though, I think, to define them in the META description on the same HTML page - as you would have to at one stage write the page itself, enter the title and all that anyway. When you want to modify it, you change it on the page itself. Instead of having to open up a different file, find the corresponding entry (eg: line 203), and type it in there.

Croco Dylan View Drop Down

Members Profile

Find Members Posts

Newbie

Joined: 20 August 2003
Location: Netherlands
Status: Offline
Points: 34

Posted: 01 September 2003 at 1:36am

ray12 wrote:

To the above, point taken. You're right.

As to the off-topic keywords, this does pose me with yet another (self-created) difficulty. How to avoid "dirty" search results? This is something I keep strugglin' with.

As to which set of keywords would be closest to what users expect, you'll probably understand that (in my case) user feedback will be highly needed. I just gotta have the users' feedback on this, in order to optimize my own search box. Knowing which keywords they use, but also checking the popularity of certain pages (most popular pages on top in search result?), or using polls (rate this page, did you find ...) . But for instance this (or likewise) site's always useful too http://inventory.overture.com/d/searchinventory/suggestion/

As to where to define keywords. Of course, these should always be in the meta tags of the pages. But I might have two different policies here. One for the bots and external search machines which is mostly "generic" qualifications. 2nd policy, for internal use, I'll have to use all the "generic info", but in addition to this might want more keywords or other descriptions. I don't want my page to show in a Google result, when searched on the term "greenpeace". Topic-wise my page has nothing to do with these guys, except for the link. Internally however, you should be able to search on greenpeace or seal.

If I'd want to modify the description, page title or keywords, I could do so in the html file itself. In which case bots would/might pick up the new words at their next visit. Nuisance here being of course that I'd have to change things in two different places. Both in the datafile (intern search box) as well as in the original html file (for bots). Discrepancies are very easily "born" here, and to avoid such errors I'd have to (manually) check and double-check everything. A way too tedious task and it will not prevent me from making mistakes (fatigue, too fast editing, accidently cut & paste over another line, typo's and ....). Looking for something better I came up with (let's call it) keyword.ini.

In the keyword.ini two lines for each page, one to define the generic keywords and such (for bots) and one for additional words which are only used internally (or all on the same line). Only the keyword.ini file is used to manually fill certain values (keywords). Next, a script could be used to check whether both the html files and the datafile (own search box) contain the right words as defined in my ini file. Simply take the line from the keyword.ini, and compare these with both the html files and the datafile (search box).

Something along the lines of

while (readline keyword.ini)
extract name of html-file, fill fields keywords, description, page title
open appropriate html file
   read lines from html file
    if line begins with "<meta name="description
     then compare descriptions (keyword.ini with html description)
      if different, then replace
       (same with keywords, page title, or whatever else)
   read lines from datafile (for search box)
    same procedure for the datafile as with html files above

This way, I only have to define things in one place, namely the keyword.ini file. The check-script could be set at timed intervals and it will change whatever needs to be changed. This script could also produce a list of changed files or trigger an incremental upload (changed files). The keyword.ini could be easily imported into any other file (such as csv or database), so with some easy scripting I could "translate" this file to whatever format needed when I'd upgrade to more advanced techniques. So everything I do in the ini file, can be used when I upgrade or start using other techniques.

As to a user coming back for a specific link (the seal), I could always generate a "site map" for hyperlinks. This is probably a good idea anyway, especially where I want to hide surprise links or menus (adds to the surprise effect). Just as with the keywords.ini, this could be put into a script. Run locally and upload if changed.

ray12

Members Profile

Find Members Posts

Newbie

Joined: 31 July 2003
Status: Offline
Points: 11

Posted: 01 September 2003 at 10:49pm

Croco Dylan wrote:

Hmm. So you need two sets of keywords for each page. I actually came across something that could work like this in Zoom but I've never had to use it. Some information here:

http://www.wrensoft.com/zoom/support.html#zoomwords

Basically, they seem to have a custom "zoomwords" meta tag, in addition to the usual "keywords" meta tag. The "zoomwords" are only acknowledged by zoom, and are ignored by Google and other search engines. This should let you keep both the public/"generic" keywords and the "dirty"/internal keywords defined on the same page. You might want to look into it.

Or you could go with the keyword.ini file if you're feeling peckish for scripting :) That seems to still rely on having the words in a seperate file from the HTML page though i think? But then you said it'll check, so i assume it'll have to extract the 'correct'/updated text?

Sitemap definitely sounds like a good idea with the 'surprise' link naming system you're thinking of. Sounds like its going to be quite a site! :)

Croco Dylan View Drop Down

Members Profile

Find Members Posts

Newbie

Joined: 20 August 2003
Location: Netherlands
Status: Offline
Points: 34

Posted: 02 September 2003 at 12:52am

Ray12 wrote:

That seems to still rely on having the words in a seperate file from the HTML page though i think? But then you said it'll check, so i assume it'll have to extract the 'correct'/updated text?

Not sure I clearly explained my intentions. So a simple example.

Let's say I have a page called pirates.htm in the articles directory, with the keywords "pirates, sailing" in the meta tags. Let's now say that I'd want to replace the keyword "sailing" with "history", how would I the do this?

In this example the old line in the html file pirates.htm would read like
<meta tags="keywords" content="pirates,sailing">

My keywords.ini file could read something like this
art/pirates.htm;[page title];[description];pirates,sailing;[own keywords]

If I need to change the keywords pirates,sailing to pirates,history I'd do so in the ini file by changing the line as typed above to

art/pirates.htm;[page title];[description];pirates,history;[own keywords]

The script reads all line of the ini file and for the line as typed above it will do the following.
1. read lines
2. find and open (in this case) pirates.htm
3. search for lines beginning with "<meta tags=keywords (..) "
4. Compare keywords as found in ini file with keywords as found in html file
5. If different, replace the old keywords with the new keywords (write to html or replace text).

After the script the same line in pirates.htm would now be
<meta tags="keywords" content="pirates,history">

Something along the same lines could be used for the datafile as used by my search box.

Script can be either started manually from prompt or at timed intervals. Such scripts are all run locally. I could make the script in such a manner that I'm able to either run the script for entire directories (all htm files) or when given the filename as a parameter only runs for the given filename (pirates.htm).

Post Reply	Page <1 234 >

Forum Jump

Forum Permissions View Drop Down

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot delete your posts in this forum
You cannot edit your posts in this forum
You cannot create polls in this forum
You cannot vote in polls in this forum