Page 1 of 1

convertAltChunks removing altChunks from docx

PostPosted: Sat May 04, 2019 6:18 am
by steveEP
I have a docx I am creating using a template with placeholders. I find the placeholders and add alt chunks using this method:

Code: Select all
   public static void addAltChunk(R run, String html){
      String chunk = "<!DOCTYPE><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\"><head></head><body>" + html + "</body></html>";
      try {
         mdp.addAltChunk(AltChunkType.Xhtml, chunk.getBytes(), run);
      } catch (Exception e) {
         e.printStackTrace();
      }
   }


I insert some of the altchunks into table cells. (The Tc contains a P which contains an R. I use that R as the ContentAccessor to insert the altchunk).

When I save the doc without converting the altchunks, everything works great. However, I need to convert the docx to pdf after and when I do so the altchunks do not display (bc they have not been converted).

so I call:

Code: Select all
WordprocessingMLPackage tempPackage = mdp.convertAltChunks();
tempPackage.save(new java.io.File(finalPath + documentName));


The resulting docx is missing the altChunk content. There are no errors. My dependencies are the following:

Code: Select all
      <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j-JAXB-Internal</artifactId>
        <version>8.0.0</version>
      </dependency>
      <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j-ImportXHTML</artifactId>
        <version>8.0.0</version>
      </dependency>


I appreciate any help, thanks!

Re: convertAltChunks removing altChunks from docx

PostPosted: Sat May 04, 2019 6:24 am
by jason
Could you please post a sample docx containing altChunks which docx4j is not converting?

Re: convertAltChunks removing altChunks from docx

PostPosted: Tue May 07, 2019 2:28 am
by steveEP
Thanks Jason. I've attached the docx which contains the altchunks. This is before converting the altchunks since once I convert them the altchunks are removed from the doc.

altChunkTest.docx
(239.5 KiB) Downloaded 103 times

Re: convertAltChunks removing altChunks from docx

PostPosted: Wed May 08, 2019 10:18 am
by jason
In your docx, you have placed your w:altChunk inside paragraph runs (w:p/w:r):

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
        <w:p w14:paraId="39449174" w14:textId="78B523F1">
            <w:r>
                <w:t></w:t>
                <w:altChunk r:id="rId29"/>
            </w:r>
        </w:p>
 
Parsed in 0.001 seconds, using GeSHi 1.0.8.4


This is wrong (an altChunk is block level content and should be a sibling of w:p), so on loading it results in log messages like:

Code: Select all
09:08:12.235 [main] WARN  o.d.jaxb.JaxbValidationEventHandler 89 - [ERROR] : unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"altChunk"). Expected elem
09:08:12.235 [main] WARN  o.d.jaxb.JaxbValidationEventHandler 112 - troublesome node: <w:altChunk xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" r:id="rId28"/>
09:08:12.235 [main] WARN  o.d.jaxb.JaxbValidationEventHandler 114 - in parent node: <w:r xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
                            <w:altChunk xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" r:id="rId28"/>
                        </w:r>
09:08:12.235 [main] WARN  o.d.jaxb.JaxbValidationEventHandler 121 - #document/document/body/tbl/tr/tc/p/r/altChunk
09:08:12.235 [main] INFO  o.d.jaxb.JaxbValidationEventHandler 183 - continuing (with possible element/attribute loss)


What you need to do: in your input docx, ensure your altChunks are w:p siblings

(Also, ensure they are well-formed XML)

Re: convertAltChunks removing altChunks from docx

PostPosted: Thu May 09, 2019 4:16 am
by steveEP
Hi Jason, thanks for the response!

When I add the altChunk without the AttachmentPoint argument, it adds the altChunk at the end of the document (as expected) and works correctly when I call convertAltChunks(). Thank you.

My problem is that I need to add the altChunks at specific locations (using placeholders). I can find the w:p that contain the placeholder and I know I need to add the altChunks as siblings to w:p, but when I call getParent() of my w:p that contains the placeholder it is the Body.

So how can I add the altChunks at the specific location if they are only children of the body? Is there some kind of ContentAccessor I can create and add to the document using the index of the w:p which contains the placeholder? And then use that as the AttachmentPoint when adding the altchunk?

Thanks again!

Re: convertAltChunks removing altChunks from docx

PostPosted: Fri May 10, 2019 6:52 am
by jason
You are correct, the methods in https://github.com/plutext/docx4j/blob/ ... kHost.java all add the altChunk to the end of the content list.

It would be straightforward to adapt one of these methods, to insert at a particular index, since ((ContentAccessor)this).getContent() gives you a plain old List. For example, alter https://github.com/plutext/docx4j/blob/ ... t.java#L80

You just need to know the index you want to insert at.

Again, would you mind adding an issue? "JaxbXmlPartAltChunkHost should support insert at index."

It may be too late for you now, but as an alternative to this, you could consider OpenDoPE. It can bind XHTML to a content control. So your XML data part would contain XML elements containing encoded XHTML (think &lt; for <) and then when docx4j encounters a content control pointing at that (via XPath), it converts the XHTML to docx content, and inserts it into the document.

I guess the reason nobody has asked for "JaxbXmlPartAltChunkHost should support insert at index" before is that with the OpenDoPE approach, you don't need it.

Re: convertAltChunks removing altChunks from docx

PostPosted: Sun May 26, 2019 5:45 pm
by jason