Page 1 of 1

Reading header with drawing corrupts docx file

PostPosted: Tue Sep 11, 2018 1:28 am
by sdeal
Hello,

I am running into a problem when I attempt to read the contents of a header that contains Word drawings. I can read and modify the contents within the code alright, but when the file is saved, Word is unable to open it. The details of the error Word gives are "The XML data is invalid according to the schema. Location: Part: /word/header3.xml, Line: 0, Column: 0". I am using version 6.0.1 of docx4j through Maven. I have identified is that this only seems to happen when I am using MOXy. The project I need this code for is in WebLogic Server, which uses MOXy as the default JAXB implementation. I tried setting the Java system properties as defined at the end of this page to change to the Glassfish RI provider, but it didn't seem to work. I should note that I have reproduced this issue both in and outside of WebLogic.

This code should reproduce the issue using the problematic documents (args[0] is the file to read from, and args[1] is the file to save to):

Code: Select all
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File(args[0]));
HeaderPart part = (HeaderPart) wordMLPackage.getParts().get(new PartName("/word/header3.xml"));
part.getContents();
wordMLPackage.save(new File(args[1]));


After some experimentation with diving into the getContents() and unmarshal() code, the problem seems to happen when the part is unmarshalled and the resulting Hdr object is set as the contents of the HeaderPart. I've attached the problematic file from both before and after running the above code. Do you have any idea what the issue could be? Let me know if you need more information.

Re: Reading header with drawing corrupts docx file

PostPosted: Tue Sep 11, 2018 3:07 pm
by jason
Thanks for the clear problem report.

The problem is that your header contains:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
    <w:r>
      <mc:AlternateContent>
        <mc:Choice Requires="wps">
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


and although that attribute value doesn't require the wps namespace to be declared, Word expects it to be (somewhere, for example on the root node).

I'll take a look to see how best to coerce MOXy to do that.

Re: Reading header with drawing corrupts docx file

PostPosted: Thu Sep 13, 2018 3:51 pm
by jason

Re: Reading header with drawing corrupts docx file

PostPosted: Fri Sep 14, 2018 1:37 am
by sdeal
The new build seems to have done the trick! Thanks a ton! I'm guessing this new fix will be in the next release of docx4j; do you have any idea when that might be? It isn't a huge deal, just curious.

Re: Reading header with drawing corrupts docx file

PostPosted: Thu Sep 20, 2018 10:11 am
by jason
sdeal wrote:the next release of docx4j; do you have any idea when that might be?


in 2 to 4 weeks? there will be a short beta period first.