Print Page | Close Window

5000 newspaper articles

Printed From: Web Wiz Forums
Category: General Discussion
Forum Name: General Discussion
Forum Description: General discussion and chat on any topic.
URL: https://forums.webwiz.net/forum_posts.asp?TID=4190
Printed Date: 29 March 2026 at 6:49pm
Software Version: Web Wiz Forums 12.08 - https://www.webwizforums.com


Topic: 5000 newspaper articles
Posted By: jellejacob
Subject: 5000 newspaper articles
Date Posted: 11 July 2003 at 2:53am
Hi all, I hope this is the right forum to post.

I dont know where to start so I'll start from the beginning.

My client asked me to convert 5000 printed newspaper articles to usable text with OCR (Optical Character Recognition) for use on there intranet which runs on an windows server. The also would like an option to search by keyword through the content of the whole range of articles.

I know how to convert these articles to usable text. But my question is, what is the most usable/fastest way of getting this articles in a database and which database MS Acces or SQL server? I've also have an option to export directly from my OCR program to HTML or XML.

I hope someone can give me some advice.




Replies:
Posted By: the boss
Date Posted: 11 July 2003 at 5:31pm

export them to HTMl.. then use the Web wiz search application.. the application has a capability to search in the text of a file for given keywords..



-------------
http://www.web2messenger.com/theboss">


Posted By: Gullanian
Date Posted: 11 July 2003 at 7:33pm
Ah yes, but it uses the description meta tag to get the keywords, for each of the 5000 articles you would need to put some HTML keywords in..... that could obviously take a while!


Posted By: the boss
Date Posted: 11 July 2003 at 7:40pm
export them to XML then..use XSL for formatting.. and for searching too in combination with ASp i guess

-------------
http://www.web2messenger.com/theboss">


Posted By: WebWiz-Bruce
Date Posted: 12 July 2003 at 1:24am
I would put all the ariticles in an SQL Server database, databases are made for searching so that should be simple enough

-------------
https://www.webwiz.net/web-wiz-forums/forum-hosting.htm" rel="nofollow - Web Wiz Forums Hosting
https://www.webwiz.net/web-hosting/windows-web-hosting.htm" rel="nofollow - ASP.NET Web Hosting


Posted By: jellejacob
Date Posted: 12 July 2003 at 4:01am

Thanks people for your replies. I think I'll go for the SQL-server database option. XML is at this piont one bridge to far for me.

Thanks again!



Posted By: Bunce
Date Posted: 12 July 2003 at 7:29pm

When you say search by keyword, what do you mean?

Remember that if this was a subset of words from an article then you'd need to specify exactly what these keywords are...

If however you just mean to search every word in an article then its a lot easier.

Might pay to look into the 'Full-Text-Index' feature of SQL Server:
http://www.microsoft.com/sql/evaluation/features/fulltext.asp - http://www.microsoft.com/sql/evaluation/features/fulltext.asp

Cheers,
Andrew



-------------
There have been many, many posts made throughout the world...
This was one of them.



Print Page | Close Window

Forum Software by Web Wiz Forums® version 12.08 - https://www.webwizforums.com
Copyright ©2001-2026 Web Wiz Ltd. - https://www.webwiz.net