Print Page | Close Window

Non-English PDF Conversion to XML

Printed From: Web Wiz Forums
Category: General Discussion
Forum Name: General Discussion
Forum Description: General discussion and chat on any topic.
URL: https://forums.webwiz.net/forum_posts.asp?TID=23554
Printed Date: 29 March 2026 at 12:01pm
Software Version: Web Wiz Forums 12.08 - https://www.webwizforums.com


Topic: Non-English PDF Conversion to XML
Posted By: proaudience
Subject: Non-English PDF Conversion to XML
Date Posted: 18 June 2007 at 6:17pm
Hi,
I'm a novice as far as this subject is concerned. Would like to know some basic info about converting pdf files into xml ones. Is it possible to convert a pdf file with non-English text into an xml one, and then filter the resulting file for specific words to carry out analysis/ queries of one's own choice?

Looking forward to some useful remarks. Many thanks!



Replies:
Posted By: proaudience
Date Posted: 18 June 2007 at 8:06pm
I have some info available online in my native language in the pdf format. What I need to know is, could it be converted to xml file, and then the resulting text (which will be non-English of course with a different font), filtered out to get entries based on specific non-English words? Will this be possible in practice to do in my case?


Posted By: KCWebMonkey
Date Posted: 19 June 2007 at 3:38am
are you wanting to do this locally or online? I found a utility that runs locally and converts PDF to HTML or XML: http://pdftohtml.sourceforge.net/ - http://pdftohtml.sourceforge.net/


Posted By: proaudience
Date Posted: 19 June 2007 at 3:07pm
Originally posted by KCWebMonkey KCWebMonkey wrote:

are you wanting to do this locally or online? I found a utility that runs locally and converts PDF to HTML or XML: http://pdftohtml.sourceforge.net/ - http://pdftohtml.sourceforge.net/


Thanks KCWebMonkey. Either locally or online both will do. The problem is it has become difficult to extract fonts from pdf files since 2000, though there are ways people speak about in online forums. This question was put forward to me by somebody who needed loads of non-English text to be converted into xml files and then get it analyzed for specific words of that language. Both of us being rather uneducated in such tricky computer matters, I decided to field this question for opinions here.

The link you have forwarded is good enough, but I'll have to first learn the ways for installing it properly,since my knowledge stops to the usual exe and zip files that software come with. LOL



Posted By: KCWebMonkey
Date Posted: 19 June 2007 at 3:26pm

check the forum linked to that project. I saw a thread that explains how to use it as a command line utility.




Print Page | Close Window

Forum Software by Web Wiz Forums® version 12.08 - https://www.webwizforums.com
Copyright ©2001-2026 Web Wiz Ltd. - https://www.webwiz.net