Archive for May, 2012

docx4j 2.8.0 released

May 24th, 2012 by Jason

I’m pleased to say that docx4j 2.8.0 is now released.

What is docx4j?

docx4j is an open source (Apache v2) library for working with docx, pptx, and xslx files, based around  JAXB.

What’s new?

The headline feature is XHTML import.  docx4j can convert XHTML to Word document content, formatting it based on the CSS.  Images and tables are supported. See the ConvertInXHTMLDocument and ConvertInXHTMLFragment samples.

Where do you get it?

See our downloads page or:

Binaries: You can download a jar alone or a zip with all deps or pick and choose.  If you’re upgrading from 2.7.1, you need the  docx4j jar and:

Source: the source code is on GitHub at https://github.com/plutext/docx4j; here’s how to setup docx4j source code

Maven: docx4j 2.8.0 is in Maven Central.  Here is a guide to getting started (where it says 2.7.1, just use 2.8.0).

Getting Started

See the “Getting Started” guide, in html docx or pdf flavours.

There is lots of sample code here (freshly reviewed for 2.8.0).

Support

If you are looking for help (and have read the Getting Started Guide :-) ), you can post in our forums, or on Stack Overflow (where there is a docx4j tag).

Thanks to our contributors

A number of contributions have made this release what it is; thanks very much to those who contributed.

Contributors to this release and a more complete list of changes may be found in README.txt

Thanks also to those who have +1’d pages on this website, or tweeted or blogged about docx4j, which is critical to expanding the docx4j community!

docx4j from GitHub in Eclipse

May 18th, 2012 by Jason

This post is old.  Please see instead 2015/06/docx4j-from-github-in-eclipse-5-years-on/

docx4j is now on GitHub!  https://github.com/plutext/docx4j

This should make it easier for users to maintain their own branches (public or private), and contribute improvements back.

As of now, GitHub is the project’s authoritative version control.  We’re no longer updating the existing svn repository.

Its pretty easy to work with docx4j sources in Eclipse. This post shows you how.

First, make sure you have eGit installed in Eclipse.  Install it from here.  On Windows, it is also useful to have msysgit.  Refer elsewhere for how to set these up. Update: there is a GitHub Windows client now (I haven’t tried it) which apparently includes msysgit.

You also need m2eclipse

Assuming you’ve done all that, setting up the docx4j source code is just a few steps.

But first, be aware there is a difference between cloning and forking.  Cloning gives you a copy of the source code you can work on, but without more, no easy way to contribute changes back.  Forking sets you up with the source code, and makes it easy to contribute changes back.

If you think you might be making changes to the docx4j source code, you’re probably best to create a fork on GitHub right from the start.

Step 1 (optional, but recommended): To create a fork, log in to GitHub, visit https://github.com/plutext/docx4j then press the “Fork” button.

Step 2: Create your local repository (git clone)

This can be done from within Eclipse, or using Git Gui (easiest), or Git Bash Shell.

To do it from within Eclipse, File > Import .. > Repositories from GitHub:

If you forked docx4j, find your fork (it might not appear immediately, which is why Git Gui or Git Bash Shell are better for this step), select it, and click next.

If you didn’t fork docx4j, type ‘docx4j’ then press ‘search’, the plutext/docxj repository should come up:

Select plutext/docx4j, then click next.

This creates a local git repository on your computer.

Step 3: Now you need to import that repository into Eclipse as a project:

File > Import .. > Projects from Git

Eclipse should find the existing project settings:

(If it didn’t and you had to use the new projects wizard; be sure to set the file location to wherever your git repository is, rather than letting Eclipse create a new empty project in the workspace)

Now you should have a docx4j project in Eclipse, and it should be properly configured (since the project settings come with the project).

You should be done. But if something isn’t right, you can configure it manually (see further below).

Next steps?  Improve the docx4j source code in Eclipse :-), then Team > Commit, to commit those changes to your local repository.

Made a change which would be useful to others?  If you forked docx4j as per step 1 above, you can push your changes to your repository on GitHub, then send a pull request.

If you didn’t fork docx4j, do that now on GitHub, then configure things locally to push your changes to your repository on GitHub, then you’ll be right to push your changes to your repository on GitHub, then send a pull request.  Other docx4j users will thank you for this :-)

Manual configuration:

Configure > Convert to Maven Project

Properties > Java Compiler > Compiler compliance level: change to 1.6

Java Build Path > Libraries: remove 1.5 system library; Add Library … JRE System Library .. 1.6

Java Build Path > Source: check none of the entries say “Excluded: **” (remove the exclusion)

JAXB can be made to run on Android

May 17th, 2012 by Jason

A customer asked me to prepare a sample Android project which converts docx to HTML.

The result is AndroidDocxToHtml

Since docx4j relies heavily on JAXB, the key to getting it working was getting JAXB – the reference implementation – to run on Android.

Android presents us with a number of challenges:

  1. it won’t let you add a jar which includes classes in the javax.xml namespace (which is where the JAXB API lives)
  2. JAXB uses JAXP 1.3 DatatypeFactory, but Android doesn’t provide it
  3. JAXB uses javax.activation.DataHandler
  4. Dalvik has a limit of 65536 method references per dex file
  5. it doesn’t support package level annotations (which JAXB uses, and which in docx4j supply namespaces)

Ill-advised or mistaken usage of a core class (java.* or javax.*)

You’ll get this message if you try to add a jar containing classes in java.* or the following javax packages:

accessibility crypto imageio management naming
net print rmi security sound sql swing transaction
xml

Android doesn’t provide javax.xml.bind, and it won’t let you add it yourself.  It forces you to re-package it.  Just like on Google AppEngine, until Google eventually added it.

OK, done that; see https://github.com/plutext/jaxb-2_2_5_1/tree/android2 (the 2 in android2 is meaningless)

Repackaging is easy enough; the problem with it is that any library which uses the repackaged code, must also be changed.  In the case of docx4j, this means a new branch, and ongoing maintenance.

JAXB uses JAXP 1.3 DatatypeFactory, but Android doesn’t provide it

com.sun.xml.bind invokes javax.xml.datatype.DatatypeFactory.newInstance, whereupon Android  throws  javax.xml.datatype.DatatypeConfigurationException: Provider org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl not found.

Easy solution: jar it up and provide it.

JAXB uses javax.activation.DataHandler

Easy solution: use the activation and additionnal jars from http://code.google.com/p/javamail-android/downloads/list

Dalvik  limit of 65536 method references per dex file

This is more an issue running docx4j on Android than one related to JAXB, but it is worth noting.  We’re running very close to this limit.  Vote for the issue at http://code.google.com/p/android/issues/detail?id=7147

Also, you may need to give Eclipse more heap space  (symptom is ‘you get Unable to execute dex: Java heap space’).   In eclipse.ini, I used:

-Xms256m

-Xmx4096m

In Eclipse, Windows > Preferences > General > Show Heap Status gives you an entry on the bottom row which is useful.

Just when I thought it would all work…

I found that my XML was not unmarshalling, because it contains namespaces, and for some reason the objects in my JAXB were being read as not having any.

The problem is that Android doesn’t support package annotations: http://code.google.com/p/android/issues/detail?id=16149 (vote), but JAXB needs to read them.  For example:

@javax.xml.bind.annotation.XmlSchema(namespace = “http://schemas.openxmlformats.org/package/2006/relationships”, elementFormDefault = javax.xml.bind.annotation.XmlNsForm.QUALIFIED)

I ended up devising a simple minded way to tell JAXB about these programmatically.  See Context.java.   Hmmm, I probably should have created my own RuntimeInlineAnnotationReader implementation (Google ‘JAXBIntroductions’).

That done, it more or less works (if you need support for other package level annotations, you’ve got a bit more to do).   The re-packaged JAXB is here.  You can build it using ant -f build-repackaged.xml dist

It should work on Android 3 or 4.

To use it, where your code would otherwise import javax.xml.bind, use ae.java.xml.bind.