Page 1 of 1

High CPU when creating pdf using DOCx4j

PostPosted: Fri Feb 05, 2021 11:48 pm
by deepanshu.gupta
Hi,
I have a use case wherein, I need to convert many docx files to pdf files by replacing some variables in the docx template. I have used 2 approaches to convert docx to pdf. I have attached to methods that I have used:

Docx4J.toPDF(wordMLPackage, new FileOutputStream(file));
Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

Both these approaches become a bottleneck in my process as they consume very high CPU and memory. Can anyone suggest to me a better solution or alternative?
P.S. I am converting all the documents to pdfs in a multithreaded environment.

Re: High CPU when creating pdf using DOCx4j

PostPosted: Sat Feb 06, 2021 7:59 am
by jason
PDF generation is inherently CPU intensive, and uses more the longer and more complex (eg tables) your documents get.

That said, via FO is a "cheap and cheerful" solution, which is less efficient than some of the alternatives; it is a 2 step process which uses XSL FO as an intermediary format.

For alternatives in the docx4j world, please see https://www.docx4java.org/blog/2020/09/ ... x4j-8-2-3/

If you want to use less local CPU, consider the Microsoft Graph approach.

Re: High CPU when creating pdf using DOCx4j

PostPosted: Mon Feb 08, 2021 3:44 pm
by monika_thakran
Hi Jason,

Thanks for your reply

Regarding conversion to xsl-fo, currently we are parsing the docx document every time to generate our 300k pdf having different variables.
Code: Select all
             
                        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new ByteArrayInputStream(bytes));
         VariablePrepare.prepare(wordMLPackage);
         MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();


Some how it is possible to store the xls-fo, generated from docx in the system and then keep generating pdfs after replacing the variables for each set.

because currently we are parsing the document 300k times but actually it's the same document

Re: High CPU when creating pdf using DOCx4j

PostPosted: Mon Feb 08, 2021 6:33 pm
by jason