Page 1 of 1

XmlUtils.getNewDocumentBuilder() blocking on factory

PostPosted: Wed Jul 22, 2015 1:37 am
by roded
Hi,
I'm running a multithreaded application in which each thread is a heavy docx4j user.
Profiling the application, I'm noticing some contention around XmlUtils.getNewDocumentBuilder() which is blocking the static instance of DocumentBuilderFactory.

From what I understand from https://jaxp.java.net/docs/spec/html/#p ... ead-safety, the DocumentBuilderFactory.newDocumentBuilder() method should be threadsafe.
It is expected that the newSAXParser method of a SAXParserFactory implementation, the newDocumentBuilder method of a DocumentBuilderFactory and the newTransformer method of a TransformerFactory will be thread safe without side effects...


Is there a reason I'm missing for this synchronization?
Letting it go would quicken things (in these cases).

On the same note, regarding the comments in XmlUtils on sharing DocumentBuilders, ThreadLocal seems like the optimal solution.. Not sure though how much it would save as I I'm not sure how much it costs to create a new DocumentBuilder.

Cheers,
Roded

N.B. If Github is the right place for this, do tell and I'll move the discussion.

Re: XmlUtils.getNewDocumentBuilder() blocking on factory

PostPosted: Wed Jul 22, 2015 11:41 pm
by jason
Compare http://stackoverflow.com/questions/9828 ... -in-java-5

It might be thread safe, but bear in mind the user can determine which implementation of DocumentBuilderFactory they wish to use, so short of testing the various common ones, i thought better safe than sorry...

I haven't tried any of the other approaches discussed at https://community.oracle.com/thread/1626108

ThreadLocal<DocumentBuilder> in https://svn.apache.org/repos/asf/shindi ... lUtil.java

but see http://java.jiderhamn.se/2012/01/29/cla ... iate-name/

How much time does removing the synch save you?

Re: XmlUtils.getNewDocumentBuilder() blocking on factory

PostPosted: Thu Jul 23, 2015 7:44 am
by roded
First screenshot shows 10 threads running and producing a docx each.
Roughly speaking, the first half of each thread's run is calculations and the second half is docx4j processing (traversing, modifications saving etc.).
The blocks on the second halves of the runs are due to org.docx4j.XmlUtils.getNewDocumentBuilder() XmlUtils.java:126.
2015-07-22 23_21_48-structures-webapp (no packaging) - YourKit Java Profiler 2015 build 15068 - 64-b.png
1
2015-07-22 23_21_48-structures-webapp (no packaging) - YourKit Java Profiler 2015 build 15068 - 64-b.png (2.72 KiB) Viewed 2131 times


Taking a look at Yourkit's Monitor Usage tab (see second screenshot), all of tomcat's threads were blocked at some point due to getNewDocumentBuilder().
summing the time each thread was blocking on the method results in 3601ms out of a total CPU Time (all threads) of 35147ms.
So to guesstimate, I'd save around 10% of my total time if I could mitigate this block. I think...
2015-07-22 23_32_37-structures-webapp (no packaging) - YourKit Java Profiler 2015 build 15068 - 64-b.png
2
2015-07-22 23_32_37-structures-webapp (no packaging) - YourKit Java Profiler 2015 build 15068 - 64-b.png (60.72 KiB) Viewed 2131 times


That last link is interesting, I was not aware of this issue.

Roded

Re: XmlUtils.getNewDocumentBuilder() blocking on factory

PostPosted: Fri Jul 24, 2015 6:12 pm
by jason
I'm curious, what docx4j functionality are you using which is invoking XmlUtils.getNewDocumentBuilder() so much?

Re: XmlUtils.getNewDocumentBuilder() blocking on factory

PostPosted: Sat Jul 25, 2015 1:40 am
by roded
A bit hackish...
I'm cloning MainDocumentParts by setting their package and calling MainDocumentPart.getContent() which implicitly loads its contents.
I suspect this is faster than iterating over the parts' content and calling XmlUtils.DeepCopy() for each child object (even considering the contention on the document builders.)
Each thread is cloning quite a few other MainDocumentParts in this case.