Sep 05 2020

Office pptx/xlsx/docx to PDF to in docx4j 8.2.3

docx4j 8.2.3 facilitates 3 distinct ways to convert Microsoft Word docx documents to PDF. There are also possibilities for converting pptx or xlsx to PDF.

The three approaches:

  • export-fo: the content is converted to XSL FO, and from there, to PDF (or any of the other formats supported by Apache FOP)
  • documents4j: since 8.2.0, use Microsoft Word to do the conversion
  • via-Microsoft-Graph: new in 8.2.3, use java-docx-to-pdf-using-Microsoft-Graph to do the conversion

So which should you choose? The following table covers some of the things you might want to consider:

export-FO Microsoft Graph documents4j
Overview Conversion of docx to XSL FO, then uses Apache FOP to convert to PDF Uses Microsoft’s cloud Uses your Microsoft Office installation 
Fidelity Suitable for simple documents (text, tables, supported image types, header/footers) 100% (Microsoft’s fidelity) 100% (Microsoft’s fidelity)
Suitability simple docx docx, pptx, xlsx docx, xlsx 
License considerations ASL v2 Refer applicable Microsoft cloud terms Refer Microsoft EULA governing your Office install 
(increasingly restricted with each release)
Cost Free Microsoft cloud costs (Microsoft Office)
Confidentiality documents don’t leave your server documents go to Microsoft cloud documents need not leave your servers
Other advantages – Fast XSL FO/PDF templating for high volume PDF creation
– Open source, so can be extended
– Microsoft encourages this approach
– Microsoft cloud handles scalability
– Can update a docx table of contents
– Can convert RTF and binary .doc
Other disadvantages Two step (docx to XSL FO to PDF) processing is slower (except for XSL FO templating) – Dependency on 3rd party cloud
– Currently can’t update docx table of contents
vote to fix
– documents4j doesn’t support pptx
– Not supported by Microsoft

No Responses so far »

Comment RSS · TrackBack URI

Say your words