Archive for the ‘docx4j’ Category

docx4j from GitHub in Eclipse – 3 years on

June 16th, 2015 by Jason

In May 2012 we posted docx4j-from-github-in-eclipse. That was more than 3 years ago now, so its about time to update that walkthrough :-)

This post is about getting the docx4j source code setup in Eclipse, so you can not just use it, but easily study it as well (and submit pull requests!).  If you have no interest or need to do that, please see hello-maven-central (if you’re already using a recent Eclipse, you can start at step 4) and/or docx4j-3-0-and-maven (but do use our current version 3.2.x)

Preliminaries – JDK

Make sure you have the JDK installed; Java 6 or later.  The JRE alone is not enough, since it doesn’t include a compiler (javac).

Preliminaries – Eclipse

Install Eclipse.  These days, the basic package has everything you need (ie git and maven support):

Git & GitHub

GitHub is docx4j’s authoritative source repository.  Eclipse now includes a git client. (If you have an older Eclipse, you can install eGit) However, it is still handy to have other git clients installed:

  • on Linux (listed first, given git’s provenance), install git using your distribution’s package manager
  • on Windows, the Git BASH shell is handy; as is Atlassian’s SourceTree
  • on OSX, ditto

Clone or Fork?

With Git, there is a difference between cloning and forking.

  • Cloning gives you a copy of the source code you can work on, but without more, no easy way to contribute changes back.
  • Forking sets you up with the source code, and makes it easy to contribute changes back.

If you think you might be making changes to the docx4j source code, you’re probably best to create a fork on GitHub right from the start.

To create a fork, log in to GitHub, visit then press the “Fork” button.

Choose your poison

There are 3 steps to installing docx4j:

  1. clone the docx4j repo
  2. install its dependencies
  3. install docx4j project in Eclipse

You can do these 3 steps entirely within Eclipse, but Eclipse by default doesn’t give much feedback as to what its doing, so you might wonder whether its still working properly.

Since its just as easy (or easier) to use the command line, I’ll show that way first:-

Command Line Approach

To do it this way, you’ll need:

  • a git shell, and
  • Maven

Both of these are worth having in any case.

Step 1. To clone docx4j from your git shell, use the github URL for docx4j (your fork or Plutext’s):

$ git clone -b master –single-branch docx4j
Cloning into ‘docx4j’…
remote: Counting objects: 42008, done.
remote: Compressing objects: 100% (58/58), done.
remote: Total 42008 (delta 23), reused 7 (delta 0), pack-reused 41946
Receiving objects: 100% (42008/42008), 61.03 MiB | 128.00 KiB/s, done.
Resolving deltas: 100% (25108/25108), done.

You should now have a docx4j directory, containing the docx4j source code.

Step 2. Next, to get docx4j’s dependencies, you’ll need Maven.

So first, install Maven (if you don’t have it already).  Please see the instructions at (actually, you’ve already got Maven in Eclipse, but its a bit hard to use from the command line).

Now you can go into your docx4j directory, and type:

mvn install -DskipTests=true

You’ll see Maven download docx4j’s dependencies.

Step 3. Now you are ready to start Eclipse.

Because docx4j includes Eclipse project definition files, you can import the docx4j project.

From the File menu, click Import, then Existing Projects into Workspace:

Browse to your docx4j directory:

Then click Finish.

Now the project should be set up correctly.  If you see errors, please refer further below for troubleshooting.

Eclipse only approach

Step 1. clone the docx4j git repo in Eclipse; for this you need the Git Repositories View:

Window > Show View > Other > Git > Git Repositories View

Click “Clone a Git repository” then enter the URI for docx4j (your fork, or Plutext’s), then click Next.

The master branch is probably all you need (though Eclipse will probably fetch all the others at some point anyway!)

Step 2. On the next screen, you can tick “Import all existing projects after clone finishes“.  (if you don’t do that, you’ll have to manually File > Import, then Existing Projects into Workspace, as explained above)

Step 3. Eclipse will now start building the project; first Maven will get the dependencies.

This may take a while … to see what Eclipse is doing while it displays the status “Building workspace”, from the Console view, click the drop down to see the Maven Console:

There you can watch it downloading stuff.

You can also look at the Progress view.

When its all done, you should have a docx4j project there and ready to go!


I don’t cover issues with git clone or maven here; just issues with Eclipse.

If Eclipse has a problem with your docx4j project, you’ll see an exclamation mark:

You can see further info in the Problems view; the most likely problem is that your Java is misconfigured:-

To fix this, on the docx4j project, click Alt-Enter to go into its properties.

Then click Java Build Path, then the Libraries tab.

Do you see a red cross next to JRE System Library, as above?

If so click on the JRE System Library entry to select it, then click the Remove button.

Next click Add Library, the JRE System Library, then add one (1.6 or above).

Note the warning:

That’s OK, we changed the JRE on the Java Build Path up above.

Hello World

Now you are ready to run some docx4j code.

A good place to start is to run  CreateWordprocessingMLDocument

Use docx4j in your own project

To use docx4j in your own project, there are 2 approaches:

  • the Maven way.  If you’re planning to use Maven, you just specify docx4j as a dependency, and if the version matches (look in pom.xml), it’ll use your docx4j project (assuming workspace resolution is switched on).  Please see hello-maven-central (if you’re already using a recent Eclipse, you can start at step 4) and/or docx4j-3-0-and-maven (but do use the version specified in pom.xml)
  • or, via the Java Build Path > Projects tab.

Docx4jHelper Word AddIn

December 4th, 2014 by Jason

The dream:

  • View Open XML right from within Word, and see what happens when you edit it.
  • Or generate corresponding docx4j Java code, with deep links into the corresponding docx4j source code and Open XML spec.

Regular users of docx4j will be aware of our webapp, which amongst other things, generates docx4j Java code for the specified Open XML in your sample docx/pptx/xlsx.

The webapp is useful, but it has a few draw backs:

  • you have to upload your docx/pptx/xlsx, which takes time
  • if your docx/pptx/xlsx contains sensitive data, you probably want to remove that first
  • the webapp might be down

To address these issues, we’re now offering the code gen functionality as a Word AddIn.

If you install the Word AddIn, this means you can now generate code without your docx leaving your computer.

This is all feasible because docx4j can run as a DLL in a .NET project, thanks to IKVM!

Where to get it

You can download the installer.  After you complete the landing form (using your corporate email address, not gmail etc), you’ll be sent a download link.

Getting Started

After a successful installation, after restarting Word, you should see a “Docx4j” menu, containing:

To generate code, first press the “Load Helper” button.

You’ll see the following form:

Its inviting you to start a local web server which will run the same code as the existing webapp.  Just choose a port you aren’t already using.  If for some reason you want to browse using Internet Explorer (as opposed to your default browser), check the box.

It’ll take a little while to start the server; you’ll see a dialog when its started.

Now you can generate code.  To do so, select something in your docx, then click the “Generate Code” button.

After a while, a window will open in your web browser, and you’ll see:

That’s the view of the docx package, which will be familiar if you’ve used the webapp.   For how to generate code from here, see our earlier post.

Code generating is done on your computer.  (But note, the links on that page to docx4j source code and the OpenXML spec are external links)

What about the “Edit OpenXML” button?

If you select something in your docx, then click that button, after a while (maybe 30 secs the first time!), you’ll see the corresponding XML in an editor window:

You can go ahead and edit it, then click the “Apply” button.

If Word likes your XML, you’ll see your changes on the document surface.  Ctrl Z should work for undo.

So there are 2 ways to see the underlying XML

The first way we described uses your web browser; the second is a Windows Form.

These two views have different features; maybe a later release will unify them?

What about pptx, xlsx?

There’s no reason in principle we couldn’t make a similar AddIn for Powerpoint and Excel.  In fact, we plan to make these, once any teething issues have been ironed out in the WordAddIn.

In the meantime, for pptx and xlsx, you can continue to use the webapp.

Help, Suggestions and other Discussion

If you are a Plutext customer experiencing an issue, please email

Otherwise, please check the Docx4jHelper AddIn forum.

We’ve got some ideas for where the AddIn goes from here, but we’d love to hear yours.

docx to PDF in C#/.NET

September 5th, 2014 by Jason

How to convert docx to PDF without using Microsoft Word?

If you docx is mainly text, tables and images, docx4j.NET may work well for you.  Edit (Feb 2015): if not, you may be interested in our new commercial high fidelity PDF renderer.

docx4j.NET is open source (Apache software license v2), identical to the Java version, but made into a DLL using IKVM.  Currently we’re at v3.2.0, released last week.

It is easy to test; you can upload your docx to the docx4j demo webapp

Or with very little effort, you can run it from a sample project in Visual Studio.  Its very easy, because docx4j.NET is in the repository:

To create your sample project:

  1. make sure you have NuGet Package Manager installed
    • for VS 2012 and later, its installed by default
    • for VS 2010, NuGet is available through the Visual Studio Extension Manager; see the above link.
  2. create a new project in Visual Studio (File > New > Project).  A Console Application is fine.  I chose that from the .NET 3.5 list.
  3. from the Tools menu, choose NuGet Package Manager > Package Manager Console
  4. type Install-Package docx4j.NET

You should see something like:

And then, your project/solution will be populated to look like:

We’re nearly there!  Notice the file src/samples/c_sharp/Docx4NET/DocxToPDF.cs

Click on your project in Solution Explorer, then right click (or hit Alt+Enter) to get the properties pane:

Then set the “startup object” as shown in the above image.

Now you can hit Ctrl+F5 (“Start without Debugging”) – you don’t want to debug, since that’s really slow.

You should see some logging in the console window, culminating in “done! Press any key to continue..”

What just happened?  All being well, the sample docx “src\samples\resources\sample-docx.docx” was saved as a PDF “OUT_sample-docx.pdf” in your project directory.

You can modify src/samples/c_sharp/Docx4NET/DocxToPDF.cs to read your own test docx.

A few comments.

XSL FO; Apache FOP. docx4j creates PDF via XSL FO.  It generates XSL FO, then uses Apache FOP (v1.1) to convert the XSL FO to PDF.  FOP also supports other output formats (the subject of another blog post).

Logging, Commons Logging. Logging is via Commons Logging.  In the demo, it is configured programmatically (ie in  DocxToPDF.cs).  Alternatively, you could do it in app.config.

OpenXML SDK interop: src/main/c_sharp/Plutext/Docx4NET contains code for converting between a docx4j representation of a docx package, and the Open XML SDK’s representation.

Improving PDF support. To improve the quality of the PDF output, typically you’d make the improvement to docx4j first (ie the Java version), then create a new DLL using the ant build target dist.NET.   docx4j is on GitHub, and is most easily setup using Maven (see earlier blog post).

Help/support/discussion. You can post in the docx4j PDF output forum, or on StackOverflow (be sure to use tag docx4j, plus some/all of c#, docx, pdf, fop, xslfo as you think appropriate).  Please don’t cross post at both!

docx4j in a single page

May 15th, 2013 by Jason

Here’s a single A4 page reference/overview of docx4j aka a cheat sheet, in PDF or PNG format.

This one is focused on docx files (WordprocessingML).

I’ll create something similar for pptx and xlsx over coming days.

docx4j/pptx/xlsx online code generation

May 15th, 2013 by Jason

Just launched is

You should be able to see it in the menu at the top right of this website (if not, reload the web page…).

There are three things you can do with it right now:

• Explore your docx/pptx/xlsx and its representation in docx4j

• Convert  docx to PDF or XSL FO

• Merge docx files (eg cover letter plus contract) into a single docx, using Plutext’s MergeDocx. Or the same thing for pptx files, using MergePptx.

Here I want to focus on the first of these.

After you’ve uploaded your docx/pptx/xlsx, the first thing you see is like docx4j’s PartsList sample:

Here, I’ll click in the left hand column to look at the main document part, document.xml

When I do that, I see the XML:

No surprises there.

But notice the hyperlinks.  Here I’ll just click on the first w:p.

What you get back, is Java source code to create that complete structure:-

As you can see from the image above, both styles of code (as described in docx4j’s Getting Started document) are produced for you.  With a bit of luck, you can cut/paste either into your IDE (Eclipse or whatever), and just run with it!

To actually see the created object in an Office document, you’ll still need to add the created object to a part.  See Getting Started, or the cheat sheet for how to do that.

I hope this helps you to create/modify your Office documents more efficiently,with docx4j!

Do let us know what you think in the comments, or in docx4j’s forums.

docx4j 2.8.0 released

May 24th, 2012 by Jason

I’m pleased to say that docx4j 2.8.0 is now released.

What is docx4j?

docx4j is an open source (Apache v2) library for working with docx, pptx, and xslx files, based around  JAXB.

What’s new?

The headline feature is XHTML import.  docx4j can convert XHTML to Word document content, formatting it based on the CSS.  Images and tables are supported. See the ConvertInXHTMLDocument and ConvertInXHTMLFragment samples.

Where do you get it?

See our downloads page or:

Binaries: You can download a jar alone or a zip with all deps or pick and choose.  If you’re upgrading from 2.7.1, you need the  docx4j jar and:

Source: the source code is on GitHub at; here’s how to setup docx4j source code

Maven: docx4j 2.8.0 is in Maven Central.  Here is a guide to getting started (where it says 2.7.1, just use 2.8.0).

Getting Started

See the “Getting Started” guide, in html docx or pdf flavours.

There is lots of sample code here (freshly reviewed for 2.8.0).


If you are looking for help (and have read the Getting Started Guide :-) ), you can post in our forums, or on Stack Overflow (where there is a docx4j tag).

Thanks to our contributors

A number of contributions have made this release what it is; thanks very much to those who contributed.

Contributors to this release and a more complete list of changes may be found in README.txt

Thanks also to those who have +1’d pages on this website, or tweeted or blogged about docx4j, which is critical to expanding the docx4j community!

docx – internal hyperlinks

April 11th, 2012 by Jason

There have been a couple of posts on the forum lately regarding adding hyperlinks to other parts of a docx.

This blog post walks you through the generic process for investigating an issue like this.

First, create a sample docx in Word which exhibits the issue of interest.

Here I’m interested in hyperlinks to a heading, and to a bookmark. So see this docx. Second, look inside it (its a zip file). For the link to the heading, document.xml contains a w:p containing:

      <w:hyperlink w:anchor="_My_heading" w:history="1">
            <w:rStyle w:val="Hyperlink"/>
          <w:t>My heading</w:t>

The heading itself is automatically given a bookmark:

        <w:pStyle w:val="Heading1"/>
      <w:bookmarkStart w:id="0" w:name="_My_heading"/>
      <w:bookmarkEnd w:id="0"/>
        <w:t>My heading</w:t>

For the link to my bookmark, Word 2010 used the legacy field formulation:

        <w:fldChar w:fldCharType="begin"/>
        <w:instrText xml:space="preserve"> HYPERLINK  \l "bm1" </w:instrText>
        <w:fldChar w:fldCharType="separate"/>
      <w:r w:rsidRPr="00D16ABA">
          <w:rStyle w:val="Hyperlink"/>
        <w:fldChar w:fldCharType="end"/>

Third, what rels are involved? To answer this, I run the docx through docx4j’s PartsList sample. It shows me that these hyperlinks don’t create any rels. Alternatively, to see this, you could have looked at the rels part when you unzipped the docx.

So we can see that adding an internal hyperlink to a heading requires that it be bookmarked first. Once you have a bookmark, you use a w:hyperlink to refer to the bookmark by name (not id). Doesn’t look like there is any reason to use fields for this.

Here’s a suitable method:

	 * Create a Hyperlink object, which is suitable for adding to a w:p
	 * @param bookmarkName
	 * @param linkText
	 * @return
	public static Hyperlink hyperlinkToBookmark(String bookmarkName, String linkText) {
		try {
			String hpl = "<w:hyperlink w:anchor=\"" + bookmarkName + "\" xmlns:w=\"\" " +
            "w:history=\"1\" >" +
            "<w:r>" +
            "<w:rPr>" +
            "<w:rStyle w:val=\"Hyperlink\" />" +  // TODO: enable this style in the document!
            "</w:rPr>" +
            "<w:t>" + linkText + "</w:t>" +
            "</w:r>" +

			return (Hyperlink)XmlUtils.unmarshalString(hpl);
		} catch (Exception e) {
			// Shouldn't happen
			return null;

We can test it by altering the BookmarkAdd sample to add a link:

Hyperlink h = MainDocumentPart.hyperlinkToBookmark(bookmarkName, "link to bookmark");
wordMLPackage.getMainDocumentPart().addParagraphOfText("some text").getContent().add(h);

then checking the result opens in Word ok.

That’s all. Added to docx4j in revision 1777.

Hello Maven Central

October 29th, 2011 by Jason

With version 2.7.1, docx4j – a library for manipulating Word docx, Powerpoint pptx, and Excel xlsx xml files in Java – and all its dependencies, are available from Maven Central.

This makes it really easy to get going with docx4j.  With Eclipse and m2eclipse installed, you just add docx4j, and you’re done.  No need to mess around with manually installing jars, setting class paths etc.

This post demonstrates that, starting with a fresh OS (Win 7 is used, but these steps would work equally well on OSX or Linux).

Step 1 – Install the JDK

For the purposes of this article, I used JDK 7, but docx4j works with Java 6 and 1.5.

Step 2 – Install Eclipse Indigo (3.7.1)

I normally download the version for J2EE developers. Unzip it and run eclipse

Step 3 – Install m2eclipse.

In Eclipse, click Help > Install New Software.

Type “” in the “Work with” field as shown:

then follow the prompts.

Step 4 – Create your Maven project

In Eclipse, File > New > Project.., then choose Maven project

You should see:

Check “Create a simple project (skip archetype selection)” then press next.

Allocate group and artifact id (what you choose as your artifact id will become the name of your new project in Eclipse):

Press finish

This will create a project with directories using Maven conventions:

(Note: If your starting point is a new or existing Java project in Eclipse, you can right click on the project, then choose Configure > Convert to Maven project)

Step 5 – Add docx4j to your POM

Double Click on pom.xml

Next click on the dependencies tab, then click the “add dependency” button, and enter the docx4j coordinates as shown in the image below:

The result is this pom:

<project xmlns="" xmlns:xsi="" xsi:schemaLocation="">

Ctrl-S to save it.

m2eclipse may take some time to download the dependencies.

When it has finished, you should be able to see them:

Step 6 – Create

If you made a Maven project as per step 4 above, you should already have src/main/java on your build path.

If not, create the folder and add it.

Now add a new class:

import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

public class HelloMavenCentral {

	public static void main(String[] args) throws Exception {
		WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
			.addStyledParagraphOfText("Title", "Hello Maven Central");

		wordMLPackage.getMainDocumentPart().addParagraphOfText("from docx4j!");
		// Now save it"user.dir") + "/helloMavenCentral.docx") );

Step 7 – Click Run

When you click run, all being well, a new docx called helloMavenCentral.docx will be saved.

You can open it in Word (or anything else which can read docx), or unzip it to inspect its contents.

Step 8 – Adding

One final thing. If you plan on creating documents from scratch using docx4j, it is useful to set paper size etc, via Put something like the following on your path:

# Page size: use a value from org.docx4j.model.structure.PageSizePaper enum
# eg A4, LETTER
# Page size: use a value from org.docx4j.model.structure.MarginsWellKnown enum

# Page size: use a value from org.pptx4j.model.SlideSizesWellKnown enum
# eg A4, LETTER

# These will be injected into docProps/app.xml
# if App.Write=true
# of the form XX.YYYY where X and Y represent numerical values

# These will be injected into docProps/core.xml


# If you haven't configured log4j yourself
# docx4j will autoconfigure it.  Set this to true to disable that

And that’s it. For more information on docx4j, see our Getting Started document.

Please click the +1 button if you found this article helpful.

docx4j 2.7.0 released

July 8th, 2011 by Jason

I’m pleased to announce the release today of docx4j 2.7.0.

What is docx4j?

docx4j is an open source (Apache v2) library for creating, editing, and saving OpenXML “packages”, including docx, pptx, and xslx.  it is similar to Microsoft’s OpenXML SDK, but for Java rather than .NET.   It uses JAXB to create the Java objects out of the OpenXML parts.

Notable features for docx include export as HTML or PDF, and CustomXML databinding for document generation (including our OpenDoPE convention support for processing repeats and conditions).

The docx4j project started in October 2007.

What’s new?

This is mainly a maintenance release; things of note include:

  • Improvements to Maven build
  • ContentAccessor interface
  • AlteredParts: identify parts in this pkg which are new or altered; Patcher
    which adds new or altered parts.
  • Support for .glox SmartArt package (/src/glox/)
  • JAXB RI 2.2.3 compatibilty
  • OpenDoPE support improvements

Where do you get it?

Binaries: You can download a jar alone or a tar.gz with all deps or pick and choose.

Source: Checkout the source from SVN (use the pom.xml file to satisfy the dependencies eg with m2eclipse, or download them from one of the links above)

Maven: Please see forum for details (since XML doesn’t paste nicely here right now).

Dependency changes

Antlr is now required for OpenDoPE processing; this gives us better XPath processing.  The required jars are:

Getting Started

See the “Getting Started” guide.

Thanks to our contributors

A number of contributions have made this release what it is; thanks very much to those who contributed.

Contributors to this release and a more complete list of changes may be found in README.txt

A request to docx4j users

If you are happily using docx4j, it would be great if you could reply to this post with some words of recommendation for others who might be wondering whether docx4j is a good choice. I know there are thousands of you out there :-)

Some users have been kind enough to make such statements already; these may be found on the trac homepage.

Of course, there are a number of other ways you can contribute back.  Please consider doing so, especially if you think you might find yourself looking for support from volunteers in the docx4j forums.

Feedback on docx4j 2.7.0 release candidate?

June 28th, 2011 by Jason

docx4j 2.7.0 release candidate is now available at

This will form the basis of the 2.7.0 release. In fact, unless there are significant issues over the next week or so, this will become the 2.7.0 release! So please try it out and report back, positive or negative…

It is mainly a maintenance release, but things of note include:

* Improvements to Maven build

* ContentAccessor interface

* AlteredParts: identify parts in this pkg which are new or altered; Patcher
which adds new or altered parts.

* Support for .glox SmartArt package (/src/glox/)

* JAXB RI 2.2.3 compatibilty

For contributors to this release and a more complete list of changes, please see … README.txt

There are 2 new dependencies (required for OpenDoPE processing): antlr-runtime-3.3.jar and stringtemplate-3.2.1.jar For convenience, copies of these can be found in the same dir as the rc jar.

Thanks very much to everyone who contributed to this release (candidate!).

And please consider clicking one of the buttons below to circulate news of the release.