Archive for September, 2015

Aspose.Confusion in Words

September 26th, 2015 by Jason

In June, Aspose’s Shoaib Khan published a blog post purporting to cover features available in Aspose.Words for Java but not docx4j.

It is either breathtaking or amusing in its inaccuracy, depending on whether you think it was born of deceit or ineptitude.  Either way, its a caution to anyone considering drinking the Aspose.Kool-Aid!

Here I’ll go through his claims one by one.

As a general comment though, it is worth remembering that with docx4j, you can do pretty much anything the docx/pptx/xlsx file formats allow.  If docx4j doesn’t have a high level API for something you want to do, you can always implement it yourself, thanks to docx4j’s lower level JAXB-based APIs.  And docx4j is real ASLv2 open source, so you can use the source Luke!

Without further ado…

Set Page Borders.
Here Aspose seems to be talking about section properties (ie margins etc).

Shoaib implies you can’t control these in docx4j.  Of course you can!  You can add or remove sections, or modify the settings of an existing section.

Track Changes in Documents.
Their AcceptAllRevisions method is said to be similar to Word’s “Accept All Changes”.

Docx4j doesn’t provide a high level API for doing this (since users haven’t asked for it), but a user could implement this for his/herself easily enough using XSLT or docx4j’s TraversalUtil.  You could start with this XSLT

Using Control Characters.
This example is a bit bizarre, because in a docx, specific elements w:br and w:cr are used for line breaks;  OpenXML follows the usual XML rules for whitespace.

Split Tables.
This example shows the steps a user would follow to split a table into two.  Basically, clone the existing table to make a new table with the same properties, then move rows from the first table to the second.

Of course you can do the same thing in docx4j!

Repeat Table Header Rows on Pages.
This example is just about setting the header row property.  See http://stackoverflow.com/questions/14605264/repeat-table-header

Clone Documents.
Cloning a document.  Aspose suggest you can’t do this with docx4j? WTF?! OpcPackage’s clone() method has been there since 2.7.1

docx4j also includes code for making partial copies, where less than a full clone is required.

Moving the Cursor in Document.
OpenXML is of course XML, which is hierarchical.  docx4j uses JAXB to give you an object model representation of that.  The hierarchical structure is basically nested lists.  Lots of stuff boils down to finding a position in a list, and then inserting or deleting etc using the Java collections API.  To find that list/position, you’d typically use docx4j’s powerful traversal functionality.   Or you can use XPath (in JAXB, the objects are bound to the underlying XML).

Aspose has some notion of cursor position, so you can move to the start or end of the document.  This may appeal to people with a VBA background, but in practice it is of little use.

Protect Documents.
Whilst it is true that in docx4j 3.2.0 there was no high level API for the functionality Microsoft Word groups under Protect Document (Mark as Final, Encrypt with Password, Restrict Editing etc), this is available now in the 3.3.0 previews:

Working with Digital Signatures.
As with the other Protect Document features, there was no high level API in 3.2.0.  That’s not to say you couldn’t do it, but there’s a nice API for this in the commercial Enterprise Ed. (forthcoming v3.3.0)

Check Format Compatibility.
This seems to be restricted to knowing what type of document you are working with; Aspose says it doesn’t validate the file format.

Per the specs, an OPC package has a content type.  In docx4j, docx/dotx/docm etc are all represented by WordprocessingMLPackage, but you can distinguish between them by calling getContentType(); the value will be one of:

  • WORDPROCESSINGML_DOCUMENT
  • WORDPROCESSINGML_DOCUMENT_MACROENABLED
  • WORDPROCESSINGML_TEMPLATE
  • WORDPROCESSINGML_TEMPLATE_MACROENABLED

Load Text File.
Whoopey do.  Apparently you can import plain text using Aspose’s expensive software!

Of course that is trivial with docx4j.  Docx4j also supports converting altChunks to native WordML content.  For XHTML altChunks, you need docx4j-ImportXHTML;  support for altChunks of type docx is an Enterprise level feature.

Specify Default Fonts.
The way the default font works in WordprocessingML is more complicated than you might expect, in that there are a few different ways you could affect it (via the theme part or the styles part).

That said, in real life, this doesn’t tend to be a problem.  With docx4j, you can easily set  w:docDefaults/w:rPrDefault/w:rPr/w:rFonts in your styles part, if you want to.

Working with Tables:
Autofit Setting to Tables.
This is just about the tblLayout setting: w:tblPr/w:tblLayout/@w:type, which you can access via TblPr’s get/setTblLayout

Joining Tables in Document.
I can’t recall anyone ever asking for docx4j to provide a high level API to do this, but it could be added.  In the meantime, docx4j allows to you to do anything with tables which the file format allows, including joining tables.

Mail Merge
Mail Merge from XML Data Source.
docx4j provides a high level API for working with legacy MERGEFIELD fields.

If you wanted to fill those fields with data from XML, you could do that easily enough.

Where docx4j really shines though, is in its support for content control data binding.  In that approach, introduced by Microsoft in 2007, you have a bidirectional XPath mapping between content controls in the document, and an XML file.

If you are working with XML, and not forced to work with legacy MERGEFIELDs for some reason, content control data binding is the way to go.