Archive for the ‘docx’ Category

How to try Plutext for yourself

March 3rd, 2009 by Jason

Here is a screencast which walks you through sharing your own document, and trying our collaboration features:

Get the Flash Player to see this content.

Of course, you can just play with one of the pre-existing shared documents.

The video width is 1280 pixels, so if you are browsing in a narrow window, you’ll need to expand your browser window to see it properly.  (Everybody has screens that wide these days don’t they, unless they are mobile?)

For completeness:

Plutext collaboration for Word: new features

March 2nd, 2009 by Jason

We’ve just published a new build of the Word Add-In, which among other things, supports replication between users of images and comments.

For a good while now, with Plutext you’ve been able to be in a Word document at the same time as your co-workers – provided all you were doing was working on tables and paragraphs (editing them, inserting, deleting or moving them around).

With this latest release, you can add images and Word comments, and have them replicate properly between Word 2007 users.

Here is a screencast of this in action:

Get the Flash Player to see this content.

If you want to play with this yourself, you can download our Word Add-In and give it a shot!

For username & password, please see here. The password is “tester”.

For detailed instructions, see this PDF, or this earlier screencast.

If you’d like to chat about your own Plutext installation, please contact us using this form.

collaborate on a Word doc with docx4all

November 16th, 2008 by Jason

docx4all has now reached the point where you can collaborate happily with a Word user, both working on the document at the same time.

This screencast shows a docx4all user and a Word user doing that:

Get the Flash Player to see this content.

docx4all will work on any platform if you have Java 6 installed – including Windows, OSX, or Linux.

You can try collaborating now, in your web browser by clicking here (warning: ~10 MB).  The download is of course one-time.  Next time, it will start quicker.

That link takes you to the docx4all applet, which does collaboration in your web browser.

You can also run docx4all as a desktop application – the functionality is identical.

The nice thing about the docx4all experience is that with just one-click you can be collaborating. Ok, a couple of clicks – one to start docx4all, and another to do File > Open.

Because all changes are versioned, from the Plutext menu you can see:

  • a history of all the changes which have been made to a given content control
  • a version of the document showing the most recent change to each paragraph

docx4j v2.1.0 released

November 11th, 2008 by Jason

We’re pleased to announce that we’ve released v2.1.0 of docx4j.  Get it from our downloads page.

docx4j is an open source Java library for manipulating OpenXML WordprocessingML documents, released under the Apache software licence. docx is the default file format in Word 2007 in Microsoft Office 2007, and part of an ISO standard (more or less unchanged).

v2.1.0 is mainly a maintenance release.

Attention has been paid to ease of use of hyperlinks, images, and headers/footers.

The HTML output has been redone to use the XSLT from the OpenXMLViewer project; it can be configured to save images as files, and automatic list numbers are handled.

This release should also work under Java 1.5, now that I have re-built fop-fonts.  I had contributed TTC (true type collection) handling code to FOP, and it was accepted, so fop-fonts now uses that (ie the patch which makes fop-fonts is that much smaller).

docx4j v2.0 released

July 22nd, 2008 by Jason

We’re pleased to announce that we’ve released v2.0 of docx4j.

docx4j is an open source Java library for manipulating OpenXML WordprocessingML documents, released under the Apache software licence. docx is the default file format in Word 2007 in Microsoft Office 2007.

docx4j supports the following:

  • Open existing docx (from filesystem, SMB/CIFS, WebDAV using VFS)
  • Create new docx (just one line of code)
  • Programmatically manipulate the docx document (of course), including tables, images
  • Import a binary doc (proof of concept)
  • Import/export Word 2007’s xmlPackage (pkg) format
  • Save docx to filesystem as a docx (ie zipped), or to JCR (unzipped)
  • Apply transforms, including common filters
  • Export as HTML or PDF
  • Diff/compare paragraphs or sdt (content controls), outputting OpenXML with changes marked up
  • Font support (font substitution, and use of any fonts embedded in the document)
  • Use the power of JAXB to do other cool stuff

Get it from here.

What is it about this release that warrants being labeled v2.0?

The new features include image support, diff, and xmlPackage.  A factor is the version numbering convention Microsoft has chosen for their Open XML SDK: its v2.0 which will first contain an API for WordprocessingML.

So think of a “level 1” API as one which handles the Open Packaging conventions (basically, the unzipping step), but leaves you to handle the document (part) content using low level XML (DOM, SAX, etc).

A “level 2” API is one which gives you a higher level API to manipulate the part content.  At the very least, this would include objects to represent paragraphs, tables, styles etc.  But you’d also expect it to be easy, for example, to add a paragraph using a specified style (maybe this is “level 3”?  In any case, docx4j can do it)

Given that docx4j brought a “level 2” WordML API to the Java world 6 months ago, it is appropriate that it be labelled version 2.0.

Click to try docx4all v0.2

May 3rd, 2008 by Jason

Jo and I are pleased to have just uploaded a new version of docx4all for you to try.

We’ve added quite a few features since I last blogged about docx4all (21 Feb).

New features include:

The VFS file chooser allows docx4all to open documents not just from the local file system, but also from a WebDAV server (such as Alfresco), and potentially, CIFS etc.  To do this, docx4all uses VFSJFileChooser, and webdavclient4j (a project we’ve started to address the gap left when Apache retired Slide, including its WebDAV client).

The incoming document filter is used to convert certain features of WordprocessingML which docx4all can’t yet handle, into something it can.   Examples include proofErr, hyperlink, and lastRenderedPageBreak.  This behaviour relies on a feature of docx4j, which makes it easy to apply a transform to a docx package (by converting it to pkg:package format).

Docx4all can’t yet render tables (let alone edit them), but we’re working on changing that.

docx4j now released under Apache License

April 10th, 2008 by Jason

We’re pleased to announce that docx4j is now available under the Apache License (v2).

This is a response to feedback on an earlier post.  This is also the last license change we’ll be making to docx4j. Word documents are mostly manipulated in corporate environments.  This change removes barriers to adoption of docx4j by business and institutions.

docx4j uses to efficiently turn streams inside out. That package had been available under the GPL.  Its author, Merlin Hughes, today kindly released it under v2 of the Apache License, so we now use it under that license.

There’s a new nightly build of docx4j available from the downloads page if you want to grab it.  This build can load/save to/from a WebDAV server – more on that in another post.

Microsoft Office Online .. soon?

March 3rd, 2008 by Jason

Nick Carr has sparked speculation that Microsoft will soon unveil its strategy for bringing its Office suite online – which to me means a way of working with Office documents on any computer which has an internet connection.  If you are connected, I’d expect you to be able to collaborate with others in real time; if you are not connected, I’d expect the software to work in offline mode.

When I say “any computer”, I don’t mean to restrict that to any particular operating system (and indeed, Silverlight runs on the Mac, and Microsoft has announce it is working with Novell on a linux implementation).  What good is collaboration software if some of the people you need to collaborate with can’t play?I thought I’d make some predictions about the business model.

There seem to be 2 key questions:

  •  does each end user pay, or does a collaboration originator pay for the right to invite a certain number of collaborators?
  • what support for Mac and Linux users, and when?

Whether each individual user is required to pay, or the originator pays, will reveal much about how Microsoft regards its online offering.  The latter model, that the person who originates a collaboration session pays for a certain number of people to be able to collaborate (ie whatever their platform), would show that their focus is firmly on collaboration.  This is the model we would use for any plutext SAAS offering (available to people who don’t want to install plutext server internally, for free or a fee). 

Here are my predictions:

  1. Enterprise version (ie behind the firewall).  There will be a version an enterprise can install on its Sharepoint server, for those businesses which are not comfortable with their documents being hosted externally.  I’m sure Microsoft can work out how to let people give access to people outside the firewall as necessary.  An enterprise licensee will be able to invite people outside the enterprise without charge.
  2. Cloud version. I expect there will be a cloud version for SMBs.  I think you will be able to use this for free, provided you have a license for the traditional Office product.  You will definitely need this (2007 version) to originate collaboration around a document (ie invite other users) – unless you are prepared to pay a full price for the online offering.  Maybe anyone will be able to accept a collaboration invitation (ie whether or not they are licensed to use Office), making the “who pays” question mute.  To create a new document (or print it?), I expect you will need to have a licence for the traditional Office product, or pay for the SAAS offering.
  3. Mac and Linux support.  I think Microsoft will offer Mac support sooner or later, but delay any hint of support for Linux for as long as possible.  This is because Linux is much more of a threat than OSX (two reasons: (1) Linux is free, and (2) it is very easy to install it on your existing Windows PC).  That said, they might have it “only on Windows” to try to keep people there – until some critical tipping point is reached.  I would say that even now, the only thing stopping Microsoft from seeking revenues from Linux users are the inevitable press headlines along the lines of “Microsoft admits defeat” that would come with this.  The cost of this in terms of perception would surely outweigh any incremental revenues in the short term.  Mac users may be able to use it for free – provided they had an Office license they were able to associate with their online user ID.  
  4. docx only. The documents which come out of this online service will be docx documents, not binary or RTF.  This will help to make the new format ubiquitous.

I wonder whether the collaboration protocols will be published under the recent interoperability initiative?  If they are, the way would be open for a rich world, in which docx4all could potentially play…  I’d be pleasantly surprised if they were, and there was nothing stopping someone from making a client or server of their own.  If anyone else could create a server, then why not get rid of it altogether and go peer-to-peer?  Maybe, just maybe, the thinking is that it would take forever for someone other than Microsoft to create a fully featured server, so third party implementations are to be encouraged (as is presently the case for OpenXML), since Microsoft’s offering will always be the RollsRoyce implementation which attracts the most usage, with the other implementations adding value to the ecosystem.

 The announcement, if/when it comes, will be fascinating!  (more…)

.docx to HTML or PDF using Java

January 13th, 2008 by Jason

Doug Mahugh recently mentioned someone using the DocX2Html.xsl that ships with SharePoint to preview DOCX files in HTML.

As it happens, we’ve just implemented HTML and PDF output in docx4j using a similar approach. We’re using the earlier WordML2HTML XSLT stylesheet available from Oleg Tkachenko. (It would be great if Microsoft also made the presumably newer DocX2Html.xsl that ships with SharePoint freely available).

To create the HTML, we use Sun’s xhtmlrenderer (thanks Sun!). See the obligatory tutorial.

To create the PDF, we take the HTML, and run it through Sun’s pdf-renderer (thanks again, Sun). And again, the tutorial.

The icing on the cake is the PDF Viewer which comes with pdf-renderer. That will give us print preview and printing in docx4all.

Finally, thanks Lars for bringing pdf-renderer to my attention.

Styles and numbering

January 11th, 2008 by Jason

This week, thanks to JAXB, we added strongly typed content models for the Styles part, and the Numbering definitions part of a docx.

Have a look at and, used by their respective parts.