Page 1 of 1

Docx4J throws an exception when it encounters missing image

PostPosted: Thu Nov 17, 2022 6:06 pm
by Extern
Hi,

I want to open several docx and some of them have been edited in several Word versions over the passt few years. As it seems, Word has saved an invalid state sometimes. Word itself can handle this, but docx4j 8.3.8 is throwing an exception, like this one:
Code: Select all
org.docx4j.openpackaging.exceptions.Docx4JException: For source /word/header1.xml, cannot find part word/NULL from rel rId1=NULL
   at org.docx4j.openpackaging.io3.Load3.getRawPart(Load3.java:626) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.io3.Load3.getPart(Load3.java:372) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.io3.Load3.addPartsFromRelationships(Load3.java:278) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.io3.Load3.getPart(Load3.java:400) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.io3.Load3.addPartsFromRelationships(Load3.java:278) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.io3.Load3.getPart(Load3.java:400) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.io3.Load3.addPartsFromRelationships(Load3.java:278) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.io3.Load3.get(Load3.java:196) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:572) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:421) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:387) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:352) ~[docx4j-core-8.3.8.jar:na]
   at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:182) ~[docx4j-core-8.3.8.jar:na]

I uploaded word_null.docx with an example. In the header of page 2 is an image which has lost its imagedata as it seems. Word itself is displaying the missing data as in the uploaded screenshot.
Is it possible to handle this exception in docx4j internally, so that the loading of this file is still possible? This robust approach would be a better behaviour in my use cases.
Kind regards,
Christian.

Re: Docx4J throws an exception when it encounters missing im

PostPosted: Sat Nov 19, 2022 11:22 am
by jason
Exploring Word's behaviour for various cases:
+ It seems that if the rel exists but the image is missing, then Word removes @r:embed from a:blip and deletes the rel.
+ If the rel does not exist, then Word removes @r:embed from a:blip.
+ If the rel exists but points to the styles part (anything which isn't an image?), then Word removes @r:embed from a:blip and deletes the rel.

Word shows a "picture can't be displayed" dummy image on screen, and in a save as PDF, and print, but it doesn't actually save that image to the file.

So there seem to be some behaviours we could emulate. https://github.com/plutext/docx4j/commi ... 6027512e9a addresses your specific issue.

What docx4j does in the rel exists case depended on whether it resolves to a content type (in which case it can create a part), or not (as in your target=NULL case, now addressed).

Re: Docx4J throws an exception when it encounters missing im

PostPosted: Sat Dec 10, 2022 12:54 am
by Extern
Thanks a lot, Jason, for adressing this issue. Will this fix will be available in version 8, too?