Page 1 of 1

PDF output performance for long documents

PostPosted: Thu Nov 03, 2011 3:39 am
by rpmajithia
Hi All,

when i am trying to convert 100's of pages of docx file it takes so much time so how can i increase the performance of this conversion.
thanks
Ravi

Re: Facing performance issue

PostPosted: Thu Nov 03, 2011 9:16 am
by jason
Convert what to what?

Where is the performance issue? Is it on load, or elsewhere?

Re: Facing performance issue

PostPosted: Fri Nov 04, 2011 6:12 pm
by rpmajithia
Hi,

Whenever we are trying to convert 1000 of pages of docx files to pdf file it takes so much time so how can i increase performance of application.

Re: PDF output performance for long documents

PostPosted: Sat Nov 05, 2011 1:19 am
by lucasfgc
Please, read Jason's post carefully and answer...

Are you converting what kind of docx ??
Text ? Table ? Figure ?

How much is "so much time" for you ?

Give us more information...

Re: PDF output performance for long documents

PostPosted: Sat Nov 05, 2011 1:53 am
by rpmajithia
sorry that i am unable to understand jason's question.

My docx file containing everything means it contains everything text, tables, images & etc. we are converting this file to FO and then we get our pdf file. this conversion period takes time so how can i reduse this period.???

Re: PDF output performance for long documents

PostPosted: Sun Nov 06, 2011 2:47 pm
by jason
When Ravi first posted, the subject didn't say "PDF". At least that much is now clear :-)

I have done some performance testing on the PDF output before. My testing was with multiple threads (1 document per thread), and my stats count the total number of pages across *all* threads.

On the fastest hardware I have, I got 40-60 pages per second. With slower hardware, you might get 2-15 pages per second.

What CPU / RAM are you using? Have you optimised your JVM memory with -XmX etc?

Note that I didn't get 40-60 pages per second from a single thread - I don't have that number. These are just indicative figures .. they'd vary based on what is in your document.

Speaking generally, the PDF output process has 2 steps:
(1) creating the XSL FO (which docx4j does, not FOP),
(2) creating the PDF from the XSL FO (which FOP does).

You need to work out which of these 2 steps is taking the most time, and determine whether the step can be sped up, or another approach needed.

If the problem is step 2, then see the FOP mailing list where there have been some recent posts about performance. If it came to it, you could try another FO processor.

A thought from left-field: does your output have page numbers and cross-references? If not, you may be able to split it into say 4 chunks, and process each concurrently, then join the resulting 4 PDFs into a single one again.