Batch Conversion Of Docx To Clean Html
Solution 1:
This looks like just what you need: http://msdn.microsoft.com/en-us/library/ff628051(v=office.14).aspx
The author Eric White blogged about his experiences developing that tool. You can see that list of posts on his blog here: http://blogs.msdn.com/b/ericwhite/archive/2008/10/20/eric-white-s-blog-s-table-of-contents.aspx#Open_XML_to_XHtml
Solution 2:
Since I'm a big fan of Aspose.Words, a commercial library to create/process Word documents, I would do something like:
- Open the Word document with Aspose.Words.
- Save the Word document as HTML.
- Use something like SgmlReader or HTML Agility Pack (or even Regular Expressions if it is suitable) to remove unwanted HTML tags/attributes.
Since you wrote you work at an university, I'm not sure whether commercial packages are an option, though.
Solution 3:
Hi not sure what the rules are on promoting your own solutions, so do let me know if I am out of line.
I am a web developer who had the same issues, so I created my own tool: http://www.convertwordtohtml.com
We are also working on a new version that will have even better conversion quality and one click conversion eg you can right click on a word file and it will be directly converted to html and the code placed into the clipboard. The current version also supports command line access and the new version will have a server version to.
There is a free trial version downloadable from the site , and if you have any questions do contact me any time.
Post a Comment for "Batch Conversion Of Docx To Clean Html"