Archive for January, 2008

Alfresco issues – update

January 31st, 2008 by Jason

It looks like most (hopefully even all) of the weird behaviour I have been experiencing with Alfresco’s JCR API disappear if I explicitly wrap the actions performed in each session in a transaction using getRetryingTransactionHelper().

According to Alfresco’s wiki, an implicit transaction should take care of this for you.  Well, some 4 days of pain tells me it doesn’t!

Plutext docx collaboration under Alfresco

January 26th, 2008 by Jason

Twelve days ago, I checked out Alfresco.

I thought Alfresco would be a good way to get access control sorted out. There are a number of other features in Alfresco which might prove interesting down the track, but access control was the immediate priority. Alfresco provides each user with a home directory, and lets invite other people to access their resources.

I also think that plutext-style document collaboration would be a great fit for many of Alfresco’s customers. Like most other document management systems, Alfresco uses the classic check-out/check-in model (detested by users the world over!). plutext collaboration frees users from that paradigm.

By Monday, a week in, plutext was basically working with Alfresco. This included the Word 2007 add-in authenticating itself when it makes web service calls (something I hadn’t implemented before). I found a few little bugs in Alfresco which I reported (here and here), but everything was going remarkably smoothly.

Sweet, I thought I’d have a relaxed Tuesday, checking the code in, updating the build and wiki, before declaring success in a blog post.

Well, that wasn’t to be. It turns out there are some major issues (here and here) with Alfresco’s JCR support and/or repository which need to be resolved. Its not so easy to identify simple test cases, since they seem to arise when a series of operations are performed in one session after another, and manifest themselves sometime later, but at least the problems are repeatable.

Hopefully the Alfresco guys will get onto these problems quickly. Otherwise I will have to learn more about Alfresco internals and its use of Hibernate than I’d care to!

Early next week (a week later than I expected) I will update the build procedures so you can easily build it for Alfresco, and then I’ll make sure it works with Jackrabbit again (we’d like to have a single content model that works for both repositories – more on that later).

If we can make good headway with the issues in Alfresco over the next week, we’ll probably regard that as our flagship configuration. If not, I’ll take another look at building access control around Jackrabbit. Although Jackrabbit lacks Alfresco’s bells and whistles, my experience with it (in the single user load scenarios which are causing Alfreco problems) was trouble free. That’s not to say I expect it to be perfect under heavy load, but it sounds very promising based on what Jukka wrote recently following the announcement of version 1.4.

.docx to HTML or PDF using Java

January 13th, 2008 by Jason

Doug Mahugh recently mentioned someone using the DocX2Html.xsl that ships with SharePoint to preview DOCX files in HTML.

As it happens, we’ve just implemented HTML and PDF output in docx4j using a similar approach. We’re using the earlier WordML2HTML XSLT stylesheet available from Oleg Tkachenko. (It would be great if Microsoft also made the presumably newer DocX2Html.xsl that ships with SharePoint freely available).

To create the HTML, we use Sun’s xhtmlrenderer (thanks Sun!). See the obligatory tutorial.

To create the PDF, we take the HTML, and run it through Sun’s pdf-renderer (thanks again, Sun). And again, the tutorial.

The icing on the cake is the PDF Viewer which comes with pdf-renderer. That will give us print preview and printing in docx4all.

Finally, thanks Lars for bringing pdf-renderer to my attention.

Styles and numbering

January 11th, 2008 by Jason

This week, thanks to JAXB, we added strongly typed content models for the Styles part, and the Numbering definitions part of a docx.

Have a look at and, used by their respective parts.

Howto: create a new document with docx4j

January 11th, 2008 by Jason

I’ve added a page to the wiki, showing how easy it is to programmatically create a new document from scratch.

Tutorial: opening an existing document with docx4j

January 11th, 2008 by Jason

I’ve added a page to the wiki, showing how easy it is to programmatically open and edit an existing document.

docx4j license change

January 11th, 2008 by Jason

A note for the record that we’ve changed the docx4j license from the GPL v3 to the Affero General Public License v3.   All users of which we are aware are happy with this change.

The logic for the change is the same as the logic for licensing plutext-server under the Affero GPL.  That is, to ensure that people who use docx4j in a SAAS environment are treated the same as people who distribute docx4j to end users.

Licensing docx4j under an Apache style licence also has its attractions – let us know if this would make a difference to you.

OOXML, boolean values and binding

January 6th, 2008 by Jason

ST_OnOff is used extensively in the XML Schema. Here is the link (nice resource!).

Basically, it is used for things which should use the built in boolean schema data type:

This simple type specifies a set of values for any binary (on or off) property defined in a WordprocessingML document.

For example, the b (bold) element has an attribute @val of type ST_OnOff.

There are several problems with how this is done.

The first is that its possible values are “on, 1, or true”. OOXML should just use the XSD boolean data type, which doesn’t allow “on” (or “off”). For related comments, see here, here, and here. Denmark and France seem to be the strongest advocates of the use of xsd:boolean, and I hope they get their way.

The second is that it is left to the specification text to say that if the attribute is omitted, its value is implied to be true. That should be expressed as part of the schema.

For CT_OnOff, it would be:

<xsd:complexType name=”BooleanDefaultTrue”>
<xsd:attribute name=”val” type=”xsd:boolean” default=”true” />

I don’t think Denmark or anyone else made this second point.

The schema we are using in docx4j to generate classes uses these sorts of definitions instead of ST_OnOff or CT_OnOff.  For CT_OnOff, this results in a BooleanDefaultTrue type, which is used in fields like (for bold):

protected BooleanDefaultTrue b

Which brings me to the the third problem with ST_OnOff (and the schema in general), which is that it generates ugly code in JAXB and other binding frameworks (presumably .NET included). The built in schema data types produce much nicer code.

As a general remark, running the schema through JAXB is a good way to find places where the schema can be improved. Schema design goals should include:

  1. that it can be processed out of the box by binding frameworks (since that makes it easier for people to pick up a schema and start using it). [This is not currently the case]
  2. that the schema be expressed in such a way as to generate the simplest code.