Page 1 of 1

To split a .docx to multiple pdf file by section

PostPosted: Thu Nov 08, 2012 7:05 pm
by Joey
Hi,

I'm trying to split a .docx file to multiple pdf files by sections.
Here is how I do.
I parse all the paragraphs in .docx file by using TraversalUtil(body, new TraversalUtil.Callback().
If paragraph.getPPr().getPStyle().getVal() is not equal to 1 or 2, the paragraph will be store in List< Object> objectBuffer.
If paragraph.getPPr().getPStyle().getVal() is equal to 1 or 2, all the object in the objectBuffer will be added to a new WordprocessingMLPackage by using
wordMLPackage.getMainDocumentPart().addObject(tempObject).
And create a new pdf from the WordprocessingMLPackage.

However, there are two problem in the pdf files.

1.image is not in the file.
Following is the error message.

Code: Select all
ERROR org.docx4j.XmlUtils .error line 946 - java.lang.NullPointerException
; Line#: 490; Column#: 37
javax.xml.transform.TransformerException: java.lang.NullPointerException
   at org.apache.xalan.extensions.ExtensionHandlerJavaPackage.callFunction(ExtensionHandlerJavaPackage.java:417)
   at org.apache.xalan.extensions.ExtensionHandlerJavaPackage.callFunction(ExtensionHandlerJavaPackage.java:440)
   at org.apache.xalan.extensions.ExtensionsTable.extFunction(ExtensionsTable.java:222)
   at org.apache.xalan.transformer.TransformerImpl.extFunction(TransformerImpl.java:473)
   at org.apache.xpath.functions.FuncExtFunction.execute(FuncExtFunction.java:208)
   at org.apache.xpath.XPath.execute(XPath.java:337)
   at org.apache.xalan.templates.ElemCopyOf.execute(ElemCopyOf.java:134)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2400)
   at org.apache.xalan.templates.ElemChoose.execute(ElemChoose.java:128)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2400)
   at org.apache.xalan.transformer.TransformerImpl.transformToRTF(TransformerImpl.java:1988)
   at org.apache.xalan.transformer.TransformerImpl.transformToRTF(TransformerImpl.java:1910)
   at org.apache.xalan.templates.ElemVariable.getValue(ElemVariable.java:312)
   at org.apache.xalan.templates.ElemVariable.execute(ElemVariable.java:248)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2400)
   at org.apache.xalan.templates.ElemChoose.execute(ElemChoose.java:128)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2400)
   at org.apache.xalan.templates.ElemChoose.execute(ElemChoose.java:141)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2400)
   at org.apache.xalan.transformer.TransformerImpl.transformToRTF(TransformerImpl.java:1988)
   at org.apache.xalan.transformer.TransformerImpl.transformToRTF(TransformerImpl.java:1910)
   at org.apache.xalan.templates.ElemVariable.getValue(ElemVariable.java:312)
   at org.apache.xalan.templates.ElemVariable.execute(ElemVariable.java:248)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2400)
   at org.apache.xalan.templates.ElemLiteralResult.execute(ElemLiteralResult.java:1376)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2400)
   at org.apache.xalan.templates.ElemLiteralResult.execute(ElemLiteralResult.java:1376)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2400)
   at org.apache.xalan.templates.ElemLiteralResult.execute(ElemLiteralResult.java:1376)
   at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:395)
   at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:178)
   at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2400)
   at org.apache.xalan.transformer.TransformerImpl.applyTemplateToNode(TransformerImpl.java:2270)
   at org.apache.xalan.transformer.TransformerImpl.transformNode(TransformerImpl.java:1356)
   at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:709)
   at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1273)
   at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1251)
   at org.docx4j.XmlUtils.transform(XmlUtils.java:834)
   at org.docx4j.XmlUtils.transform(XmlUtils.java:727)
   at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:349)
   at com.mycompany.doc4jtry.App.exportPDF(App.java:295)
   at com.mycompany.doc4jtry.App$1.apply(App.java:221)
   at com.mycompany.doc4jtry.App$1.walkJAXBElements(App.java:167)
   at org.docx4j.TraversalUtil.<init>(TraversalUtil.java:151)
   at com.mycompany.doc4jtry.App.main(App.java:152)
Caused by: java.lang.NullPointerException
   at org.docx4j.openpackaging.parts.relationships.RelationshipsPart.getPart(RelationshipsPart.java:257)
   at org.docx4j.model.images.AbstractWordXmlPicture.handleImageRel(AbstractWordXmlPicture.java:260)
   at org.docx4j.model.images.WordXmlPictureE20.createWordXmlPictureFromE20(WordXmlPictureE20.java:268)
   at org.docx4j.model.images.WordXmlPictureE20.createXslFoImgE20(WordXmlPictureE20.java:346)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at org.apache.xalan.extensions.ExtensionHandlerJavaPackage.callFunction(ExtensionHandlerJavaPackage.java:343)
   ... 54 more


2. Font and style didn't go right.
Some table have no border.
I have no idea about this.

I'll be appreciate if there are any suggestions.

Joey

Re: To split a .docx to multiple pdf file by section

PostPosted: Thu Nov 08, 2012 7:29 pm
by jason
Hi Joey

Yes, you need to deal with those issues. Not just images and styles, but numbering,comments, footnotes etc - anything that has a relationship to another part.

The way you are doing it raises the same issues as merging Word documents; see http://www.docx4java.org/blog/2010/11/m ... documents/

In fact, MergeDocx includes code to do what you want.

If you don't want to license that, there is another way. Instead of writing objects to a new WordprocessingMLPackage, you can clone your existing package, then delete items from it. This would not be as efficient. For example, if your source package included 100 images, and your chunk only used 1 of them, the other 99 would still be in the chunk, making it larger.

This way you would still have some issues, for example:

- if there is numbering, each chunk would start with 1.
- you are likely to run into https://issues.apache.org/bugzilla/show ... i?id=54094

(The MergeDocx code makes some effort to address these issues.)

Hope this helps .. Jason

Re: To split a .docx to multiple pdf file by section

PostPosted: Thu Nov 08, 2012 8:07 pm
by Joey
Thanks for your reply.
Although it seems like a bad news for me...
I'll try the way of delete object.
Or the worst case, turn it to a .html file then split it.

The scenario is that I'm going to make a online user manual while I only have a .docx file.
In case of long loading time, the file will be split by section.
Does anybody know a better way to implement it?