Sep 05 2014

docx to PDF in C#/.NET

How to convert docx to PDF without using Microsoft Word?

If you docx is mainly text, tables and images, docx4j.NET may work well for you.  Edit (Feb 2015): if not, you may be interested in our new commercial high fidelity PDF renderer.

docx4j.NET is open source (Apache software license v2), identical to the Java version, but made into a DLL using IKVM.  Currently we’re at v3.2.0, released last week.

It is easy to test; you can upload your docx to the docx4j demo webapp

Or with very little effort, you can run it from a sample project in Visual Studio.  Its very easy, because docx4j.NET is in the NuGet.org repository:

To create your sample project:

  1. make sure you have NuGet Package Manager installed
    • for VS 2012 and later, its installed by default
    • for VS 2010, NuGet is available through the Visual Studio Extension Manager; see the above link.
  2. create a new project in Visual Studio (File > New > Project).  A Console Application is fine.  I chose that from the .NET 3.5 list.
  3. from the Tools menu, choose NuGet Package Manager > Package Manager Console
  4. type Install-Package docx4j.NET

You should see something like:

And then, your project/solution will be populated to look like:

We’re nearly there!  Notice the file src/samples/c_sharp/Docx4NET/DocxToPDF.cs

Click on your project in Solution Explorer, then right click (or hit Alt+Enter) to get the properties pane:

Then set the “startup object” as shown in the above image.

Now you can hit Ctrl+F5 (“Start without Debugging”) – you don’t want to debug, since that’s really slow.

You should see some logging in the console window, culminating in “done! Press any key to continue..”

What just happened?  All being well, the sample docx “src\samples\resources\sample-docx.docx” was saved as a PDF “OUT_sample-docx.pdf” in your project directory.

You can modify src/samples/c_sharp/Docx4NET/DocxToPDF.cs to read your own test docx.

A few comments.

XSL FO; Apache FOP. docx4j creates PDF via XSL FO.  It generates XSL FO, then uses Apache FOP (v1.1) to convert the XSL FO to PDF.  FOP also supports other output formats (the subject of another blog post).

Logging, Commons Logging. Logging is via Commons Logging.  In the demo, it is configured programmatically (ie in  DocxToPDF.cs).  Alternatively, you could do it in app.config.

OpenXML SDK interop: src/main/c_sharp/Plutext/Docx4NET contains code for converting between a docx4j representation of a docx package, and the Open XML SDK’s representation.

Improving PDF support. To improve the quality of the PDF output, typically you’d make the improvement to docx4j first (ie the Java version), then create a new DLL using the ant build target dist.NET.   docx4j is on GitHub, and is most easily setup using Maven (see earlier blog post).

Help/support/discussion. You can post in the docx4j PDF output forum, or on StackOverflow (be sure to use tag docx4j, plus some/all of c#, docx, pdf, fop, xslfo as you think appropriate).  Please don’t cross post at both!


6 Responses so far

  1. 1

    TomHashNL said,

    September 13, 2014 @ 1:04 am

    It works great in a Console application, but I can’t get it to work in my ASP.NET app :/.
    I’m getting a: “org/docx4j/convert/out/fo/docx2fo.xslt not found via classloader.”
    Could you maybe point me in the right direction, I am not used to java stuff.

    Many thanks in advance!

    Cheers,

    Thomas

  2. 2

    docx4java aka docx4j – OpenXML office documents in Java » Blog Archive » Docx4jHelper Word AddIn said,

    December 4, 2014 @ 11:46 am

    […] is all feasible because docx4j can run as a DLL in a .NET project, thanks to […]

  3. 3

    sam said,

    January 15, 2015 @ 7:48 am

    I’m getting this message and it’s not converting.

    [INFO] docx4j.NET.samples.DocxToPDF – Hello from Common Logging
    [INFO] org.docx4j.jaxb.Context – java.vendor=Jeroen Frijters
    [INFO] org.docx4j.jaxb.Context – java.version=1.7.0
    [INFO] org.docx4j.jaxb.Context – No MOXy JAXB config found; assume not intended
    ..
    [INFO] org.docx4j.jaxb.NamespacePrefixMapperUtils – Using NamespacePrefixMapper
    SunInternal, which is suitable for Java 6
    [INFO] org.docx4j.jaxb.Context – Using Java 6/7 JAXB implementation

  4. 4

    mr-box said,

    March 20, 2015 @ 4:27 am

    While the above demo webapp link works, the sample project from NuGet stucks with 100% CPU on

    WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
    .load(new java.io.File(fileIN));

  5. 5

    mr-box said,

    March 20, 2015 @ 4:42 am

    After waiting for about 2 min it stopped with

    java.lang.IllegalArgumentException: Can’t clone a null argument

    on

    org.docx4j.Docx4J.toPDF(wordMLPackage, fos);

    in DocxToPDF.cs

  6. 6

    amit said,

    August 6, 2015 @ 5:22 pm

    I’m getting this message and it’s not converting.

    [INFO] docx4j.NET.samples.DocxToPDF – Hello from Common Logging
    [INFO] org.docx4j.jaxb.Context – java.vendor=Jeroen Frijters
    [INFO] org.docx4j.jaxb.Context – java.version=1.7.0
    [INFO] org.docx4j.jaxb.Context – No MOXy JAXB config found; assume not intended
    ..
    [INFO] org.docx4j.jaxb.NamespacePrefixMapperUtils – Using NamespacePrefixMapper
    SunInternal, which is suitable for Java 6
    [INFO] org.docx4j.jaxb.Context – Using Java 6/7 JAXB implementation

Comment RSS