Page 1 of 1

Capability of converting .doc to .docx

PostPosted: Sat Nov 08, 2008 12:03 am
by seaster8
Hello all,

I am in the process of researching ways to convert old word documents (in .doc form) into the new word form (.docx). I came across your software and am wondering if this is capable using your tools?

Thanks.

Re: Capability of converting .doc to .docx

PostPosted: Sat Nov 08, 2008 8:06 am
by jason
Hi

Yes, in principle, you can use docx4j to convert binary .doc to OpenXML .docx.

See http://dev.plutext.org/trac/docx4j/browser/trunk/docx4j/src/main/java/org/docx4j/convert/in/Doc.java

HOWEVER, this is more of a basic proof of concept, as opposed to something which will handle a wide range of real world documents.

The issues are two:

1. it uses Apache's POI to read the binary doc, and POI is quite limited in its handling of Word documents (which it calls HWPF)

2. and then for each feature POI can handle, we need to write support for creating a corresponding docx4j object

Currently, Doc.java has basic support for paragraphs and tables.

Have you looked at http://b2xtranslator.sourceforge.net/blog/? That is written in C#, but it is open source.

cheers

Jason