Archive for the ‘Uncategorized’ Category

docx4j 3.0 beta

November 7th, 2013 by Jason

A beta of docx4j 3.0 is now available, at:

http://www.docx4java.org/docx4j/docx4j-3_0-beta2.zip [link updated 15 Nov]

That zip file contains docx4j, and all its dependencies.  To use it, add all the jars to your classpath.

Alternatively, Maven users can get the beta from our staging repo on GitHub.

<repositories>
    <repository>
        <id>docx4j-mvn-repo</id>
        <url>https://raw.github.com/plutext/docx4j/mvn-repo/</url>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>always</updatePolicy>
        </snapshots>
    </repository>
</repositories>

docx4j 3.0 beta is:


<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j</artifactId>
<version>3.0.0-SNAPSHOT</version>
</dependency>

Our last blog post outlines the major things to be aware of in v3.

Additional notes:

  • For convenience, the zip file also contains docx4j-ImportXHTML, and its dependencies, which are LGPL.  You can delete these if you wish.  They aren’t in the mvn staging repo.
  • To see any logging, you’ll need to add an slf4j implementation.
  • You might want to add a docx4j.properties file

You can find updated Getting Started guide in docx|pdf formats at http://www.docx4java.org/docx4j/.

Feedback welcome.  You can reply here, or to the post in the docx4j forums.

All going smoothly, we’ll progress to final release over the next couple of weeks, so the sooner your feedback, the better!

docx4j 3.0 – what you need to know

October 18th, 2013 by Jason

docx4j 3.0 (beta for which will be available shortly) contains a lot of changes, some big, some small.

Here are the most visible (see our changelog for the rest):

Logging

docx4j 3.0 uses slf4j, instead of log4j.

As the slf4j website puts it:

The Simple Logging Facade for Java (SLF4J) serves as a simple facade or abstraction for various logging frameworks (e.g. java.util.logging, logback, log4j) allowing the end user to plug in the desired logging framework at deployment time.

So you need the slf4j api jar on your classpath:

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.5</version>
</dependency>

If you want to use log4j, then include it, and:

<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.5</version>
</dependency>

XHTML Import

The XHTML Import functionality is now a separate project on GitHub.

The reason being that its main dependency – Flying Saucer – is licensed under LGPL v2.1 (as opposed to ASL v2, which docx4j’s other dependencies use).

If you want this functionality, you have to add these jars to your classpath.  We’ll update this post with their coordinates once they are in Maven Central.

Docx4j facade

3.0 contains a facade providing clean access to some typical uses of docx4j:
  • Loading a document
  • Saving a document
  • Binding xml to content controls in a document
  • Exporting the document (to HTML, or PDF and other formats supported by the FO renderer)

You don’t have to use this – in that existing code should continue to work – but the facade is the right way to do things.  Behind the facade is a major rethink/cleanup to the export architecture/implementation, contributed by Alberto.

MOXy

The key technology underlying docx4j – and a major differentiator from Apache POI – is JAXB.

There is a JAXB reference implementation; the JAXB baked into Java 6 and 7 is based on it.

Prior to v3, you had to use the reference implementation, or the implementation included in the JDK.

With v3, you can choose to use EclipseLink MOXy instead.  To do so, simply include docx4j-MOXy-JAXBContext-3.0.0.jar and the MOXy jars on your classpath.

Sample code

The docx4j samples have relocated to src/samples

docx4j from GitHub in Eclipse

May 18th, 2012 by Jason

This post is old.  Please see instead 2015/06/docx4j-from-github-in-eclipse-5-years-on/

docx4j is now on GitHub!  https://github.com/plutext/docx4j

This should make it easier for users to maintain their own branches (public or private), and contribute improvements back.

As of now, GitHub is the project’s authoritative version control.  We’re no longer updating the existing svn repository.

Its pretty easy to work with docx4j sources in Eclipse. This post shows you how.

First, make sure you have eGit installed in Eclipse.  Install it from here.  On Windows, it is also useful to have msysgit.  Refer elsewhere for how to set these up. Update: there is a GitHub Windows client now (I haven’t tried it) which apparently includes msysgit.

You also need m2eclipse

Assuming you’ve done all that, setting up the docx4j source code is just a few steps.

But first, be aware there is a difference between cloning and forking.  Cloning gives you a copy of the source code you can work on, but without more, no easy way to contribute changes back.  Forking sets you up with the source code, and makes it easy to contribute changes back.

If you think you might be making changes to the docx4j source code, you’re probably best to create a fork on GitHub right from the start.

Step 1 (optional, but recommended): To create a fork, log in to GitHub, visit https://github.com/plutext/docx4j then press the “Fork” button.

Step 2: Create your local repository (git clone)

This can be done from within Eclipse, or using Git Gui (easiest), or Git Bash Shell.

To do it from within Eclipse, File > Import .. > Repositories from GitHub:

If you forked docx4j, find your fork (it might not appear immediately, which is why Git Gui or Git Bash Shell are better for this step), select it, and click next.

If you didn’t fork docx4j, type ‘docx4j’ then press ‘search’, the plutext/docxj repository should come up:

Select plutext/docx4j, then click next.

This creates a local git repository on your computer.

Step 3: Now you need to import that repository into Eclipse as a project:

File > Import .. > Projects from Git

Eclipse should find the existing project settings:

(If it didn’t and you had to use the new projects wizard; be sure to set the file location to wherever your git repository is, rather than letting Eclipse create a new empty project in the workspace)

Now you should have a docx4j project in Eclipse, and it should be properly configured (since the project settings come with the project).

You should be done. But if something isn’t right, you can configure it manually (see further below).

Next steps?  Improve the docx4j source code in Eclipse :-), then Team > Commit, to commit those changes to your local repository.

Made a change which would be useful to others?  If you forked docx4j as per step 1 above, you can push your changes to your repository on GitHub, then send a pull request.

If you didn’t fork docx4j, do that now on GitHub, then configure things locally to push your changes to your repository on GitHub, then you’ll be right to push your changes to your repository on GitHub, then send a pull request.  Other docx4j users will thank you for this :-)

Manual configuration:

Configure > Convert to Maven Project

Properties > Java Compiler > Compiler compliance level: change to 1.6

Java Build Path > Libraries: remove 1.5 system library; Add Library … JRE System Library .. 1.6

Java Build Path > Source: check none of the entries say “Excluded: **” (remove the exclusion)

JAXB can be made to run on Android

May 17th, 2012 by Jason

A customer asked me to prepare a sample Android project which converts docx to HTML.

The result is AndroidDocxToHtml

Since docx4j relies heavily on JAXB, the key to getting it working was getting JAXB – the reference implementation – to run on Android.

Android presents us with a number of challenges:

  1. it won’t let you add a jar which includes classes in the javax.xml namespace (which is where the JAXB API lives)
  2. JAXB uses JAXP 1.3 DatatypeFactory, but Android doesn’t provide it
  3. JAXB uses javax.activation.DataHandler
  4. Dalvik has a limit of 65536 method references per dex file
  5. it doesn’t support package level annotations (which JAXB uses, and which in docx4j supply namespaces)

Ill-advised or mistaken usage of a core class (java.* or javax.*)

You’ll get this message if you try to add a jar containing classes in java.* or the following javax packages:

accessibility crypto imageio management naming
net print rmi security sound sql swing transaction
xml

Android doesn’t provide javax.xml.bind, and it won’t let you add it yourself.  It forces you to re-package it.  Just like on Google AppEngine, until Google eventually added it.

OK, done that; see https://github.com/plutext/jaxb-2_2_5_1/tree/android2 (the 2 in android2 is meaningless)

Repackaging is easy enough; the problem with it is that any library which uses the repackaged code, must also be changed.  In the case of docx4j, this means a new branch, and ongoing maintenance.

JAXB uses JAXP 1.3 DatatypeFactory, but Android doesn’t provide it

com.sun.xml.bind invokes javax.xml.datatype.DatatypeFactory.newInstance, whereupon Android  throws  javax.xml.datatype.DatatypeConfigurationException: Provider org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl not found.

Easy solution: jar it up and provide it.

JAXB uses javax.activation.DataHandler

Easy solution: use the activation and additionnal jars from http://code.google.com/p/javamail-android/downloads/list

Dalvik  limit of 65536 method references per dex file

This is more an issue running docx4j on Android than one related to JAXB, but it is worth noting.  We’re running very close to this limit.  Vote for the issue at http://code.google.com/p/android/issues/detail?id=7147

Also, you may need to give Eclipse more heap space  (symptom is ‘you get Unable to execute dex: Java heap space’).   In eclipse.ini, I used:

-Xms256m

-Xmx4096m

In Eclipse, Windows > Preferences > General > Show Heap Status gives you an entry on the bottom row which is useful.

Just when I thought it would all work…

I found that my XML was not unmarshalling, because it contains namespaces, and for some reason the objects in my JAXB were being read as not having any.

The problem is that Android doesn’t support package annotations: http://code.google.com/p/android/issues/detail?id=16149 (vote), but JAXB needs to read them.  For example:

@javax.xml.bind.annotation.XmlSchema(namespace = “http://schemas.openxmlformats.org/package/2006/relationships”, elementFormDefault = javax.xml.bind.annotation.XmlNsForm.QUALIFIED)

I ended up devising a simple minded way to tell JAXB about these programmatically.  See Context.java.   Hmmm, I probably should have created my own RuntimeInlineAnnotationReader implementation (Google ‘JAXBIntroductions’).

That done, it more or less works (if you need support for other package level annotations, you’ve got a bit more to do).   The re-packaged JAXB is here.  You can build it using ant -f build-repackaged.xml dist

It should work on Android 3 or 4.

To use it, where your code would otherwise import javax.xml.bind, use ae.java.xml.bind.

docx4j 2.7.1 released

October 29th, 2011 by Jason

I’m pleased to announce the release of docx4j 2.7.1.  It was actually released 2 weeks ago, but this announcement has been delayed until I was able to publish the accompanying post on docx4j now being in Maven Central.

What is docx4j?

docx4j is an open source (Apache v2) library for creating, editing, and saving OpenXML “packages”, including docx, pptx, and xslx.  It is similar to Microsoft’s OpenXML SDK, but for Java rather than .NET.   It uses JAXB to create the Java objects out of the OpenXML parts.

Notable features for docx include export as HTML or PDF, and CustomXML databinding for document generation (including our OpenDoPE convention support for processing repeats and conditions).

The docx4j project started in October 2007.

What’s new?

This is mainly a maintenance release; things of note include:

  • Preparation for including docx4j in Maven Central
  • mc:AlternateContent preprocessor, allowing graceful degradation of Word 2010 specific content
  • docx4j.properties, supports configuration of default page size, margins, orientation; also ability to set some of the doc props metadata (Application & AppVersion; dc.creator & dc.lastModifiedBy).
  • HtmlExporterNG2,(Pdf)Conversion, SvgExporter: storing any images is delegated to a
    ConversionImageHandler that may be passed as a conversion parameter. Default implementation: DefaultConversionImageHandler
  • OpenDoPE changes – see summary post in the sub-forum

Where do you get it?

Binaries: You can download a jar alone or a tar.gz with all deps or pick and choose.

Source: Checkout the source from SVN (use the pom.xml file to satisfy the dependencies eg with m2eclipse as explain in the Maven blog post, or download them from one of the links above)

Maven: From Maven Central; please see the blog post referenced above.

Getting Started

See the “Getting Started” guide.

Thanks to our contributors

A number of contributions have made this release what it is; thanks very much to those who contributed.

Contributors to this release and a more complete list of changes may be found in README.txt

Hello Maven Central

October 29th, 2011 by Jason

With version 2.7.1, docx4j – a library for manipulating Word docx, Powerpoint pptx, and Excel xlsx xml files in Java – and all its dependencies, are available from Maven Central.

This makes it really easy to get going with docx4j.  With Eclipse and m2eclipse installed, you just add docx4j, and you’re done.  No need to mess around with manually installing jars, setting class paths etc.

This post demonstrates that, starting with a fresh OS (Win 7 is used, but these steps would work equally well on OSX or Linux).

Step 1 – Install the JDK

For the purposes of this article, I used JDK 7, but docx4j works with Java 6 and 1.5.

Step 2 – Install Eclipse Indigo (3.7.1)

I normally download the version for J2EE developers. Unzip it and run eclipse

Step 3 – Install m2eclipse.

In Eclipse, click Help > Install New Software.

Type “http://download.eclipse.org/technology/m2e/releases” in the “Work with” field as shown:

then follow the prompts.

Step 4 – Create your Maven project

In Eclipse, File > New > Project.., then choose Maven project

You should see:

Check “Create a simple project (skip archetype selection)” then press next.

Allocate group and artifact id (what you choose as your artifact id will become the name of your new project in Eclipse):

Press finish

This will create a project with directories using Maven conventions:

(Note: If your starting point is a new or existing Java project in Eclipse, you can right click on the project, then choose Configure > Convert to Maven project)

Step 5 – Add docx4j to your POM

Double Click on pom.xml

Next click on the dependencies tab, then click the “add dependency” button, and enter the docx4j coordinates as shown in the image below:

The result is this pom:


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>mygroup</groupId>
  <artifactId>myartifact</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <dependencies>
  	<dependency>
  		<groupId>org.docx4j</groupId>
  		<artifactId>docx4j</artifactId>
  		<version>2.7.1</version>
  	</dependency>
  </dependencies>
</project>

Ctrl-S to save it.

m2eclipse may take some time to download the dependencies.

When it has finished, you should be able to see them:

Step 6 – Create HelloMavenCentral.java

If you made a Maven project as per step 4 above, you should already have src/main/java on your build path.

If not, create the folder and add it.

Now add a new class:

import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

public class HelloMavenCentral {

	public static void main(String[] args) throws Exception {
		
		WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
		
		wordMLPackage.getMainDocumentPart()
			.addStyledParagraphOfText("Title", "Hello Maven Central");

		wordMLPackage.getMainDocumentPart().addParagraphOfText("from docx4j!");
				
		// Now save it 
		wordMLPackage.save(new java.io.File(System.getProperty("user.dir") + "/helloMavenCentral.docx") );
		
	}	
}

Step 7 – Click Run

When you click run, all being well, a new docx called helloMavenCentral.docx will be saved.

You can open it in Word (or anything else which can read docx), or unzip it to inspect its contents.

Step 8 – Adding docx4j.properties

One final thing. If you plan on creating documents from scratch using docx4j, it is useful to set paper size etc, via docx4j.properties. Put something like the following on your path:

# Page size: use a value from org.docx4j.model.structure.PageSizePaper enum
# eg A4, LETTER
docx4j.PageSize=LETTER
# Page size: use a value from org.docx4j.model.structure.MarginsWellKnown enum
docx4j.PageMargins=NORMAL
docx4j.PageOrientationLandscape=false

# Page size: use a value from org.pptx4j.model.SlideSizesWellKnown enum
# eg A4, LETTER
pptx4j.PageSize=LETTER
pptx4j.PageOrientationLandscape=false

# These will be injected into docProps/app.xml
# if App.Write=true
docx4j.App.write=true
docx4j.Application=docx4j
docx4j.AppVersion=2.7.1
# of the form XX.YYYY where X and Y represent numerical values

# These will be injected into docProps/core.xml
docx4j.dc.write=true
docx4j.dc.creator.value=docx4j
docx4j.dc.lastModifiedBy.value=docx4j

#
#docx4j.McPreprocessor=true

# If you haven't configured log4j yourself
# docx4j will autoconfigure it.  Set this to true to disable that
docx4j.Log4j.Configurator.disabled=false

And that’s it. For more information on docx4j, see our Getting Started document.

Please click the +1 button if you found this article helpful.

docx4j has a new home

August 6th, 2011 by Jason

For reasons best known (or only known) to Google, dev.plutext.org has never been on the first page of results when you search for “docx java”, despite all the relevant posts in our forums over more than 3 years.

I can only think Google doesn’t at all like a hostname other than “www”.

So I’ve moved everything to www.docx4java.org

This shouldn’t impact you (other than having to find this new site, and update any bookmarks) unless you are using svn and have docx4j checked out.

If you have the docx4j repository checked out, you’ll want to do something like:

 svn switch --relocate http://dev.plutext.org/svn/docx4j/trunk/docx4j http://www.docx4java.org/svn/docx4j/trunk/docx4j

If you are on Windows and using TortoiseSVN, use Tortoise’s “relocate” command (not its “switch” command).

That should make your SVN checkout work again.

There may be various broken or outdated links on the website.  I guess I’ll fix these over time.

If you encounter any other issues, then please post to http://www.docx4java.org/forums/announces/docx4j-has-a-new-home-t815.html

makeofficebetter.com shut down

March 18th, 2010 by Jason

In the months since August 2009, interested users submitted ideas to makeofficebetter.com

The Microsoft employees who ran that site have shut it down.  Not stopped accepting new submissions, but shut it down entirely.

As a result, all the community submitted data is lost to the community. Or has been taken from us, since it is no longer shared.

Why did Google acquire Docverse?

March 10th, 2010 by Jason
People have been asking me why Google bought Docverse.
Surely Google already has the collaboration smarts.  After all, Google Docs made document collaboration mainstream.  And Wave is taking it to the next level.  And they already employ zz; and they just bought aa.
What does Docverse give them?
The answer is simple.
Office 2010 Tech Guarantee will defer $300M-$350M of revenue from Q3 .. people who would otherwise buy in Q3, but wait until the TG is available.

People have been asking me why Google bought Docverse.

Surely, Google already has the collaboration smarts?  After all, it was Google Docs which made document collaboration mainstream.  And it is Google Wave which is arguably now taking it to the next level.  Google also employs Neil Fraser, and it recently bought Etherpad.

So what does Docverse give them?  And why pay so much?

Its not about getting the people – Docverse is a small team – although additional engineers with domain knowledge are surely nice to have.

What this is about is taking away the reasons for upgrading to Office 2010, and more particularly, Sharepoint 2010.  Any business which takes Sharepoint 2010 is making a commitment to Microsoft technology for the next decade or so, which effectively shuts Google enterprise products out, and might even lead these customers to use IIS etc for their consumer web sites (which would also be bad for Google).

So Google is doing what it can to give businesses reason to stop and think.

In the 6 months ended 31 December 2009, Microsoft’s Business Products Division had revenue of $9.149 billion, and operating income of $5.867 billion.  Office is responsible for around 90% of that.

What would it be worth to Google, if it could put a 5% dent in those figures? 5% of $9 billion is $450 million. 0.5% is $45 million.

Put one way, if the people responsible for just 0.5% of Office purchase decisions look at Google + Docverse and say “hey, we can stick with the version of Office we’ve got; we don’t need to buy Office 2010 and Sharepoint to do real time collaboration”, then the Docverse acquisition has made sense for Google.

But really, its about the larger ecosystems, not just the Office purchase.  An Office purchase is a commitment to Windows on the client, and possibly Windows on the server.  And it has network effects along the supply chain (people you exchange documents with).  So preventing an Office purchase frees up a lot of other spend.

Now, Google needs to prove that with Google you get:

  • the ability to keep using your existing Microsoft Office (Docverse’s contribution)
  • real-time collaboration (without Office 2010 or Sharepoint 2010)
  • web-based editing if/when you need it

Docverse gives Google slick looking Add-Ins for Word, Powerpoint and Excel.

Time is of the essence.  Office 2010 will be launched for businesses on May 12, and available online/retail in June.

The adds-ins are worth a few months head start.  (So maybe it is about the people after all?)

Now Google needs to integrate Docverse in to Google Apps.  Rip/replace of the existing Docverse back-end (and probably much of their Word Add-In, since it sends the whole document every time you save, not just the diffs – something Plutext has had right since the beginning) will take a while.  However, the rip/replace isn’t necessary for a rudimentary integration into Google Docs.  What is critical is to make Docverse’s server-side differencing work on Google scale and interoperate with the Google Docs webapp.  Same  for presentations and slides.

It’ll be interesting to see how quickly this can be done.

Importing Word documents into Google Wave

February 9th, 2010 by Jason

Plutext has released a robot for Google Wave which you can use to convert Microsoft Office Word documents into Wave content.

The robot is at docxwave@appspot.com

This is especially useful if your Word document contains tables or images, because copy/pasting from Word leaves them out. Adding the document as an attachment in Wave wouldn’t be the answer either, because that doesn’t bring the power of Wave to bear on the doc at all.

This wave was the announcement and is for support (Wave account required):

[wave id=”googlewave.com!w+Kb5sDrZkA” color=”#000000″ bgcolor=”#FFFFFF”]