Jan 13 2008

.docx to HTML or PDF using Java

Doug Mahugh recently mentioned someone using the DocX2Html.xsl that ships with SharePoint to preview DOCX files in HTML.

As it happens, we’ve just implemented HTML and PDF output in docx4j using a similar approach. We’re using the earlier WordML2HTML XSLT stylesheet available from Oleg Tkachenko. (It would be great if Microsoft also made the presumably newer DocX2Html.xsl that ships with SharePoint freely available).

To create the HTML, we use Sun’s xhtmlrenderer (thanks Sun!). See the obligatory tutorial.

To create the PDF, we take the HTML, and run it through Sun’s pdf-renderer (thanks again, Sun). And again, the tutorial.

The icing on the cake is the PDF Viewer which comes with pdf-renderer. That will give us print preview and printing in docx4all.

Finally, thanks Lars for bringing pdf-renderer to my attention.

