Page 1 of 1

Performance problems

PostPosted: Fri May 22, 2015 9:17 am
by Ivan
Hi, I am using docX4j to transform into different formats based on word templates created with openDope.

Performance is bad, and i am wondering if I am doing something wrong.

Using small data sets the solution performs relatively bad, but when the data set gets really big the bindingHandler.applyBindings() takes forever and uses a lot of cpu.

I have created a small project on bitbucket to reproduce:
https://bitbucket.org/sundti/docx4jissue/src/fcbc661a68bc177c4d15f632f84e4a785eb2bfaf/src/main/java/org/docx4jissue/PdfCreator.java?at=default

This is the console output after using an xml resulting in a 25 page pdf:

/usr/lib/jvm/java-8-oracle/bin/java -Xmx1024m -Dorg.apache.xml.dtm.DTMManager=org.apache.xml.dtm.ref.DTMManagerDefault -Didea.launcher.port=7534 -Didea.launcher.bin.path=/home/ivan/Tools/idea-IU-141.713.2/bin -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-8-oracle/jre/lib/jsse.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jce.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jfr.jar:/usr/lib/jvm/java-8-oracle/jre/lib/charsets.jar:/usr/lib/jvm/java-8-oracle/jre/lib/javaws.jar:/usr/lib/jvm/java-8-oracle/jre/lib/plugin.jar:/usr/lib/jvm/java-8-oracle/jre/lib/resources.jar:/usr/lib/jvm/java-8-oracle/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jfxswt.jar:/usr/lib/jvm/java-8-oracle/jre/lib/deploy.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/jfxrt.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/sunec.jar:/home/ivan/infosynergi/docx4jissue/target/classes:/home/ivan/.m2/repository/org/docx4j/docx4j/3.2.1/docx4j-3.2.1.jar:/home/ivan/.m2/repository/org/plutext/jaxb-svg11/1.0.2/jaxb-svg11-1.0.2.jar:/home/ivan/.m2/repository/org/plutext/jaxb-xslfo/1.0.1/jaxb-xslfo-1.0.1.jar:/home/ivan/.m2/repository/org/plutext/jaxb-xmldsig-core/1.0.0/jaxb-xmldsig-core-1.0.0.jar:/home/ivan/.m2/repository/net/engio/mbassador/1.1.10/mbassador-1.1.10.jar:/home/ivan/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar:/home/ivan/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.5/jcl-over-slf4j-1.7.5.jar:/home/ivan/.m2/repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar:/home/ivan/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/ivan/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/home/ivan/.m2/repository/commons-codec/commons-codec/1.3/commons-codec-1.3.jar:/home/ivan/.m2/repository/commons-io/commons-io/1.3.1/commons-io-1.3.1.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/xmlgraphics-commons/1.5/xmlgraphics-commons-1.5.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/fop/1.1/fop-1.1.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-svg-dom/1.7/batik-svg-dom-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-anim/1.7/batik-anim-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-css/1.7/batik-css-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-dom/1.7/batik-dom-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-parser/1.7/batik-parser-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-util/1.7/batik-util-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-bridge/1.7/batik-bridge-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-script/1.7/batik-script-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-js/1.7/batik-js-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-xml/1.7/batik-xml-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-awt-util/1.7/batik-awt-util-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-gvt/1.7/batik-gvt-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-transcoder/1.7/batik-transcoder-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-svggen/1.7/batik-svggen-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-extension/1.7/batik-extension-1.7.jar:/home/ivan/.m2/repository/org/apache/xmlgraphics/batik-ext/1.7/batik-ext-1.7.jar:/home/ivan/.m2/repository/org/apache/avalon/framework/avalon-framework-api/4.3.1/avalon-framework-api-4.3.1.jar:/home/ivan/.m2/repository/org/apache/avalon/framework/avalon-framework-impl/4.3.1/avalon-framework-impl-4.3.1.jar:/home/ivan/.m2/repository/xalan/xalan/2.7.1/xalan-2.7.1.jar:/home/ivan/.m2/repository/xalan/serializer/2.7.1/serializer-2.7.1.jar:/home/ivan/.m2/repository/net/arnx/wmf2svg/0.9.0/wmf2svg-0.9.0.jar:/home/ivan/.m2/repository/org/apache/poi/poi-scratchpad/3.8/poi-scratchpad-3.8.jar:/home/ivan/.m2/repository/org/apache/poi/poi/3.8/poi-3.8.jar:/home/ivan/.m2/repository/org/antlr/antlr-runtime/3.3/antlr-runtime-3.3.jar:/home/ivan/.m2/repository/org/antlr/stringtemplate/3.2.1/stringtemplate-3.2.1.jar:/home/ivan/.m2/repository/antlr/antlr/2.7.7/antlr-2.7.7.jar:/home/ivan/.m2/repository/com/google/guava/guava/17.0/guava-17.0.jar:/home/ivan/.m2/repository/org/springframework/spring-core/4.1.5.RELEASE/spring-core-4.1.5.RELEASE.jar:/home/ivan/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/home/ivan/Tools/idea-IU-141.713.2/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain org.docx4jissue.PdfCreator /home/ivan/infosynergi/docx4jissue/src/main/java/org/docx4jissue/Kontoutskrift.docx /home/ivan/infosynergi/docx4jissue/src/main/java/org/docx4jissue/templateData.xml

#---- BEGIN writeEnvironmentReport($Revision$): Useful stuff found: ----
version.DOM.draftlevel=2.0fd
version.JAXP=1.1 or higher
java.ext.dirs=/usr/lib/jvm/java-8-oracle/jre/lib/ext:/usr/java/packages/lib/ext
version.xerces2=not-present
version.xerces1=not-present
version.xalan2_2=Xalan Java 2.7.1
version.xalan1=not-present
version.ant=not-present
java.version=1.8.0_45
version.DOM=2.0
version.crimson=not-present
sun.boot.class.path=/usr/lib/jvm/java-8-oracle/jre/lib/resources.jar:/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar:/usr/lib/jvm/java-8-oracle/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jsse.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jce.jar:/usr/lib/jvm/java-8-oracle/jre/lib/charsets.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jfr.jar:/usr/lib/jvm/java-8-oracle/jre/classes
version.SAX=2.0
version.xalan2x=Xalan Java 2.7.1
#----- END writeEnvironmentReport: Useful properties found: -----
# YAHOO! Your environment seems to be OK.
log4j:WARN No appenders could be found for logger (org.docx4j.jaxb.Context).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

StopWatch '': running time (millis) = 90815
-----------------------------------------
ms % Task name
-----------------------------------------
03290 004% Load wordtemplate
06302 007% OpenDoPEHandler.preprocess
07276 008% OpenDoPEIntegrity.process
37470 041% bindingHandler.applyBindings()
03761 004% RemovalHandler
03762 004% Init fonts
00003 000% foSettings
28951 032% Docx4J.toFO

StopWatch '': running time (millis) = 72792
-----------------------------------------
ms % Task name
-----------------------------------------
00070 000% Load wordtemplate
03900 005% OpenDoPEHandler.preprocess
03507 005% OpenDoPEIntegrity.process
34914 048% bindingHandler.applyBindings()
03606 005% RemovalHandler
02896 004% Init fonts
00000 000% foSettings
23899 033% Docx4J.toFO

Please have a look and let me know if there is something wrong with my code, or if this is the expected performance.

Re: Performance problems

PostPosted: Mon May 25, 2015 4:02 pm
by jason
I can't comment on "expected performance" without knowing your hardware or input document/xml size
(though a 25 page output document is hardly big!).

In your numbers, most of the time is taken in the following 2 steps:

048% bindingHandler.applyBindings()
033% Docx4J.toFO

So you could have a look to see what in applyBindings is taking up the time.

Regarding Docx4J.toFO, docx4j's PDF output is fairly slow. The PDF output uses
XSL FO; it is a 2 step process: (i) generate the XSL FO, then (ii) use
Apache FOP to convert the XSL FO to PDF.

Plutext has recently released a standalone commercial PDF Converter which doesn't use FOP. It's much quicker. See http://www.plutext.com/m/index.php/prod ... o-pdf.html

Finally, if you are launching the JVM afresh, it'll be slower. If you run the
test say 20 times, you should see it speed up.

Re: Performance problems

PostPosted: Tue May 26, 2015 10:50 pm
by Ivan
Naturally it performs better the second time. After that I don't see much improvement.

I did a cpu sample of the test, running it ten times, without the toFO part resultet in this. 85% of the time is spent inside the applyBindings method.

applyTemplates.png
applyTemplates.png (67.48 KiB) Viewed 3317 times


Test was run on an Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
8 cores, 7,8gb ram, ssd drive.

Re: Performance problems

PostPosted: Wed May 27, 2015 12:18 am
by jason
What types of content controls do you bind? See the types in https://github.com/plutext/docx4j/blob/ ... /bind.xslt

If you only use the types implemented in https://github.com/plutext/docx4j/blob/ ... nXSLT.java
(ie basic text binds and images)

then following

https://github.com/plutext/docx4j/commi ... 15b460fc9a

you can try property docx4j.model.datastorage.BindingHandler.Implementation=BindingTraverserNonXSLT

NonXSLT is typically twice as fast.

Re: Performance problems

PostPosted: Wed May 27, 2015 6:36 am
by Ivan
Thanks for replying so quickly!

That change shows a significant improvement. 38% faster on the applyBindings part, 21% on the whole process.

I am able to produce the same pdf using iText in no time though, so I guess it will be up to the users how important it is to be able to customise the word templates.

It was difficult to find documentation on the new pdfGenerator, guess I should give it a try.