Page 1 of 1

Replace a variable with a page break

PostPosted: Mon Apr 18, 2022 10:36 pm
by vickov
Hello,

I want to perform some minor manipulations to a docx document and I want to insert a page break of type= "page" on the place of a certain variable. The document xml is prepared correctly and I have the variable in the ${pagebreak} format.
My script looks like this:
Code: Select all
File templateZip = new File("resources/doctest.docx")
WordprocessingMLPackage template = WordprocessingMLPackage.load(templateZip)
def map = ["pagebreak": "<w:r><w:br w:type=\"page\"/></w:r>"]
template.mainDocumentPart.variableReplace(map)


At the last line however, I receive the following exception:
Code: Select all
javax.xml.bind.JAXBException: Preprocessing exception
- with linked exception:
[org.docx4j.openpackaging.exceptions.Docx4JException: Cannot perform the transformation]
   at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:574)
   at org.docx4j.XmlUtils.unmarshallFromTemplate(XmlUtils.java:636)
   at org.docx4j.openpackaging.parts.JaxbXmlPart.variableReplace(JaxbXmlPart.java:303)
   at org.docx4j.openpackaging.parts.JaxbXmlPart$variableReplace.call(Unknown Source)
   at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
   at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
   at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
   at ideaGroovyConsole.run(ideaGroovyConsole.groovy:12)
   at groovy.lang.GroovyShell.runScriptOrMainOrTestOrRunnable(GroovyShell.java:254)
   at groovy.lang.GroovyShell.run(GroovyShell.java:360)
   at groovy.lang.GroovyShell.run(GroovyShell.java:339)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:567)
   at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
   at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSite.invoke(PogoMetaMethodSite.java:170)
   at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:73)
   at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
   at console.run(console.groovy:11)
   at groovy.ui.GroovyMain.processReader(GroovyMain.java:631)
   at groovy.ui.GroovyMain.processFiles(GroovyMain.java:552)
   at groovy.ui.GroovyMain.run(GroovyMain.java:396)
   at groovy.ui.GroovyMain.access$1400(GroovyMain.java:68)
   at groovy.ui.GroovyMain$GroovyCommand.process(GroovyMain.java:322)
   at groovy.ui.GroovyMain.processArgs(GroovyMain.java:142)
   at groovy.ui.GroovyMain.main(GroovyMain.java:115)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:567)
   at org.codehaus.groovy.tools.GroovyStarter.rootLoader(GroovyStarter.java:111)
   at org.codehaus.groovy.tools.GroovyStarter.main(GroovyStarter.java:129)
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Cannot perform the transformation
   at org.docx4j.XmlUtils.transform(XmlUtils.java:1368)
   at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:569)
   ... 32 more
Caused by: javax.xml.transform.TransformerException: com.sun.istack.SAXParseException2; unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"r"). Expected elements are <{ }text>


I saw some other topics related to issues with page breaks in here, but most of them involve pdf generation. I guess it always comes down to the unmarshalling process, but still I can't figure out how to correct this.
Any ideas or hints will be greatly appreciated.

Re: Replace a variable with a page break

PostPosted: Thu Apr 21, 2022 7:27 pm
by jason
Variable replacement is "dumb" in the sense that all it does is replace <w:r><w:t>${var}</w:t></w:r> with <w:r><w:t>result</w:t></w:r>, so you can see the problem: you'd end up with your break inside the w:t.

You could fix this in the transformation which is causing the exception; see docx-java-f6/node-path-utlization-targeting-paragraphs-t3042.html#p10383

But it would be better to take an approach to inserting your page breaks which avoids creating invalid XML in the first place...

You could marshal to a string, then replace <w:t>${var}</w:t> with <w:br w:type="page"/> then unmarshall.

Or you could work at the JAXB object level: traverse the document programmatically. See for example the sample TraverseRemoveVanish

Re: Replace a variable with a page break

PostPosted: Tue Apr 26, 2022 6:52 pm
by vickov
Sorry for my late response and thanks for the answer @jason,

I also noticed the limitation of replace variable method in the sense that it only replaces the text within the <w:t>, but it can manipulate the element itself and change it to a page break.

I'll follow your advice and post again if/when I make progress.

Re: Replace a variable with a page break

PostPosted: Sat Apr 30, 2022 1:25 am
by vickov
@Jason I tried to go down the marshall and unmarshall road, however I again hit pretty much the same error. Here is my code:

Code: Select all
File file = new File("resources/doctest.docx")
WordprocessingMLPackage wmlp = WordprocessingMLPackage.load(file)
wmlp.getMainDocumentPart().convertAltChunks()
MainDocumentPart mdp = wmlp.getMainDocumentPart()
String xml = XmlUtils.marshaltoString(wmlDocumentEl, true).replaceAll("<w:t>\\{pagebreak}</w:t>","<w:r><w:rPr><w:lang w:val=\"en-US\"/></w:rPr><w:br w:type=\"page\"/></w:r>")
ef doc = (Document) XmlUtils.unmarshalString(xml)
mdp.setContents(doc)
wmlp.save(new File("resources/doctest-new.docx")


This ends up with the error:
Code: Select all
javax.xml.bind.JAXBException: Preprocessing exception
- with linked exception:
[org.docx4j.openpackaging.exceptions.Docx4JException: Cannot perform the transformation]
   at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:574)
   at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:518)
   at org.docx4j.XmlUtils$unmarshalString$0.call(Unknown Source)
   at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
   at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
   at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
   at ideaGroovyConsole.run(ideaGroovyConsole.groovy:15)
   at groovy.lang.GroovyShell.runScriptOrMainOrTestOrRunnable(GroovyShell.java:254)
   at groovy.lang.GroovyShell.run(GroovyShell.java:360)
   at groovy.lang.GroovyShell.run(GroovyShell.java:339)
   at jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:567)
   at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
   at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSite.invoke(PogoMetaMethodSite.java:170)
   at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:73)
   at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
   at console.run(console.groovy:11)
   at groovy.ui.GroovyMain.processReader(GroovyMain.java:631)
   at groovy.ui.GroovyMain.processFiles(GroovyMain.java:552)
   at groovy.ui.GroovyMain.run(GroovyMain.java:396)
   at groovy.ui.GroovyMain.access$1400(GroovyMain.java:68)
   at groovy.ui.GroovyMain$GroovyCommand.process(GroovyMain.java:322)
   at groovy.ui.GroovyMain.processArgs(GroovyMain.java:142)
   at groovy.ui.GroovyMain.main(GroovyMain.java:115)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:567)
   at org.codehaus.groovy.tools.GroovyStarter.rootLoader(GroovyStarter.java:111)
   at org.codehaus.groovy.tools.GroovyStarter.main(GroovyStarter.java:129)
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Cannot perform the transformation
   at org.docx4j.XmlUtils.transform(XmlUtils.java:1368)
   at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:569)
   ... 30 more
Caused by: javax.xml.transform.TransformerException: com.sun.istack.SAXParseException2; unexpected element (uri:"http://schemas.openxmlformats.org/wordprocessingml/2006/main", local:"r"). Expected elements are <{http://schemas.openxmlformats.org/wordprocessingml/2006/main}delInstrText>,<{http://schemas.openxmlformats.org/markup-compatibility/2006}AlternateContent>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}footnoteReference>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}footnoteRef>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}endnoteRef>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}sym>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}yearShort>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}endnoteReference>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}softHyphen>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}continuationSeparator>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}ptab>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pgNum>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}noBreakHyphen>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPr>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tab>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}commentReference>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}drawing>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}lastRenderedPageBreak>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}monthLong>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pict>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}annotationRef>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}monthShort>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}instrText>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}cr>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}ruby>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dayShort>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}separator>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}yearLong>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}delText>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}dayLong>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}object>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}fldChar>
   at org.apache.xalan.transformer.ClonerToResultTree.cloneToResultTree(ClonerToResultTree.java:209)
   at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:109)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2402)
   at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:116)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2402)
   at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:116)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2402)
   at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:116)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2402)
   at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:116)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2402)
   at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:116)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2402)
   at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:132)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2402)
   at org.apache.xalan.transformer.TransformerImpl.applyTemplateToNode(TransformerImpl.java:2272)
   at org.apache.xalan.transformer.TransformerImpl.transformNode(TransformerImpl.java:1358)
   at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:711)
   at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1275)
   at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1253)
   at org.docx4j.XmlUtils.transform(XmlUtils.java:1366)
   ... 31 more


Also I noticed that the XML string I'm extracting through this approach is different than the one that is found in the docx document.xml file, maybe that is the cause for my issue.

Haven't tried the TraverseRemoveVanish, maybe it will help, I'll look into it as well, but in the meantime I'll be happy if you can spot what I'm doing wrong in the above mentioned code.

Cheers,
Viktor

Re: Replace a variable with a page break

PostPosted: Sat Apr 30, 2022 6:55 am
by jason
Try:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
String xml = XmlUtils.marshaltoString(wmlDocumentEl, true).replaceAll("<w:t>\\{pagebreak}</w:t>","<w:br w:type=\"page\"/>")
 
Parsed in 0.015 seconds, using GeSHi 1.0.8.4


or

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
String xml = XmlUtils.marshaltoString(wmlDocumentEl, true).replaceAll("<w:r><w:t>\\{pagebreak}</w:t></w:r>","<w:r><w:rPr><w:lang w:val=\"en-US\"/></w:rPr><w:br w:type=\"page\"/></w:r>")
 
Parsed in 0.013 seconds, using GeSHi 1.0.8.4


Your current code would result in w:r/w:r

Re: Replace a variable with a page break

PostPosted: Tue May 03, 2022 9:25 pm
by vickov
@Jason the stacking of runs was the reason I guess, because using the first of your suggestions solved the issue i.e. the unmarshalling was successful and the document was build as expected.

I was led to believe that a run in run is possible by reading the contents of this page http://officeopenxml.com/, but I guess it contains some old or not correct information - are there any alternatives to this that you are aware of?

Thanks for your answers and support,
Cheers,
Viktor