Web Wiz - Green Windows Web Hosting

  New Posts New Posts RSS Feed - 5000 newspaper articles
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

5000 newspaper articles

 Post Reply Post Reply
Author
jellejacob View Drop Down
Newbie
Newbie
Avatar

Joined: 19 June 2002
Location: Netherlands
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote jellejacob Quote  Post ReplyReply Direct Link To This Post Topic: 5000 newspaper articles
    Posted: 11 July 2003 at 2:53am
Hi all, I hope this is the right forum to post.

I dont know where to start so I'll start from the beginning.

My client asked me to convert 5000 printed newspaper articles to usable text with OCR (Optical Character Recognition) for use on there intranet which runs on an windows server. The also would like an option to search by keyword through the content of the whole range of articles.

I know how to convert these articles to usable text. But my question is, what is the most usable/fastest way of getting this articles in a database and which database MS Acces or SQL server? I've also have an option to export directly from my OCR program to HTML or XML.

I hope someone can give me some advice.

Back to Top
the boss View Drop Down
Senior Member
Senior Member
Avatar

Joined: 19 January 2003
Location: Saudi Arabia
Status: Offline
Points: 1727
Post Options Post Options   Thanks (0) Thanks(0)   Quote the boss Quote  Post ReplyReply Direct Link To This Post Posted: 11 July 2003 at 5:31pm

export them to HTMl.. then use the Web wiz search application.. the application has a capability to search in the text of a file for given keywords..


Back to Top
Gullanian View Drop Down
Senior Member
Senior Member
Avatar

Joined: 04 January 2002
Location: England
Status: Offline
Points: 4373
Post Options Post Options   Thanks (0) Thanks(0)   Quote Gullanian Quote  Post ReplyReply Direct Link To This Post Posted: 11 July 2003 at 7:33pm
Ah yes, but it uses the description meta tag to get the keywords, for each of the 5000 articles you would need to put some HTML keywords in..... that could obviously take a while!
Back to Top
the boss View Drop Down
Senior Member
Senior Member
Avatar

Joined: 19 January 2003
Location: Saudi Arabia
Status: Offline
Points: 1727
Post Options Post Options   Thanks (0) Thanks(0)   Quote the boss Quote  Post ReplyReply Direct Link To This Post Posted: 11 July 2003 at 7:40pm
export them to XML then..use XSL for formatting.. and for searching too in combination with ASp i guess

Back to Top
WebWiz-Bruce View Drop Down
Admin Group
Admin Group
Avatar
Web Wiz Developer

Joined: 03 September 2001
Location: Bournemouth
Status: Offline
Points: 9844
Post Options Post Options   Thanks (0) Thanks(0)   Quote WebWiz-Bruce Quote  Post ReplyReply Direct Link To This Post Posted: 12 July 2003 at 1:24am
I would put all the ariticles in an SQL Server database, databases are made for searching so that should be simple enough
Back to Top
jellejacob View Drop Down
Newbie
Newbie
Avatar

Joined: 19 June 2002
Location: Netherlands
Status: Offline
Points: 10
Post Options Post Options   Thanks (0) Thanks(0)   Quote jellejacob Quote  Post ReplyReply Direct Link To This Post Posted: 12 July 2003 at 4:01am

Thanks people for your replies. I think I'll go for the SQL-server database option. XML is at this piont one bridge to far for me.

Thanks again!

Back to Top
Bunce View Drop Down
Senior Member
Senior Member
Avatar

Joined: 10 April 2002
Location: Australia
Status: Offline
Points: 846
Post Options Post Options   Thanks (0) Thanks(0)   Quote Bunce Quote  Post ReplyReply Direct Link To This Post Posted: 12 July 2003 at 7:29pm

When you say search by keyword, what do you mean?

Remember that if this was a subset of words from an article then you'd need to specify exactly what these keywords are...

If however you just mean to search every word in an article then its a lot easier.

Might pay to look into the 'Full-Text-Index' feature of SQL Server:
http://www.microsoft.com/sql/evaluation/features/fulltext.asp

Cheers,
Andrew

There have been many, many posts made throughout the world...
This was one of them.
Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.08
Copyright ©2001-2026 Web Wiz Ltd.


Become a Fan on Facebook Follow us on X Connect with us on LinkedIn Web Wiz Blogs
About Web Wiz | Contact Web Wiz | Terms & Conditions | Cookies | Privacy Notice

Web Wiz is the trading name of Web Wiz Ltd. Company registration No. 05977755. Registered in England and Wales.
Registered office: Web Wiz Ltd, Unit 18, The Glenmore Centre, Fancy Road, Poole, Dorset, BH12 4FB, UK.

Prices exclude VAT at 20% unless otherwise stated. VAT No. GB988999105 - $, € prices shown as a guideline only.

Copyright ©2001-2026 Web Wiz Ltd. All rights reserved.