Page 1 of 1

converting binary .doc

PostPosted: Tue Jul 27, 2010 7:28 pm
by jason
Now I am trying to converting doc file into docx file using Doc.java but it gives the following exception..

##SummaryInformation
##DocumentSummaryInformation
##WordDocument
##CompObj
##ObjectPool
##1Table
Exception in thread "main" java.io.IOException: block[ 10 ] already removed - does your POIFS have circular or duplicate block references?
at org.apache.poi.poifs.storage.BlockListImpl.remove(BlockListImpl.java:97)
at org.apache.poi.poifs.storage.RawDataBlockList.remove(RawDataBlockList.java:32)
at org.apache.poi.poifs.storage.BlockAllocationTableReader.fetchBlocks(BlockAllocationTableReader.java:196)
at org.apache.poi.poifs.storage.BlockListImpl.fetchBlocks(BlockListImpl.java:132)
at org.apache.poi.poifs.storage.RawDataBlockList.fetchBlocks(RawDataBlockList.java:32)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.processProperties(POIFSFileSystem.java:542)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:176)
at org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocument.java:133)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:146)
at org.docx4j.convert.in.Doc.convert(Doc.java:58)
at org.docx4j.convert.in.Doc.main(Doc.java:396)

I think this line gives error

HWPFDocument doc = new HWPFDocument(in);



That is a problem in POI's hwpf support. You might try their most recent code.

Alternatively, I have had good results with OpenOffice v3.x (the Novell version) and http://code.google.com/p/jodconverter/

I am likely to remove the hwpf code from docx4j.

There is also http://b2xtranslator.sourceforge.net/ but its not Java.

problem in converting doc to docx

PostPosted: Tue Jul 27, 2010 8:42 pm
by joyy
And after trying another test case-2 doc file, conversion done successfully,
but it gives basic text in simple format with same font size its not contain any style or colour of font.


I attach both test cases doc files with this mail.

Thnx.

Re: problem in converting doc to docx

PostPosted: Tue Jul 27, 2010 9:15 pm
by jason
same comments apply to this further post ie i suggest one of those alternative approaches (although the POI representation may contain the formatting info, and if so, you could carry it across in the conversion if you were prepared to add some code on the docx4j side).