docx4java aka docx4j – OpenXML office documents in Java

Archive for the ‘OpenDoPE’ Category

OpenDoPE and XPath 2.0/3.0

January 3rd, 2019 by Jason

Docx4j generally uses Apache XPath (org.apache.xpath), from the Xalan 2.7.2 jar. (docx4j uses Xalan plus Xalan-specific extension functions for XSLT in various places including HTML export and OpenDoPE processing).

There are 2 main places where docx4j uses XPath:

JaxbXmlPartXPathAware contains method getJAXBNodesViaXPath, which – thanks to JAXB’s concept of a binder – you can use to select objects (say P objects) in your MainDocumentPart
OpenDoPE content control data binding: XPath is central to content control data binding (binding document content to XML data via XPath).

XPath 2.0 became a W3C Rec in 2007; XPath 3.0 became a W3C Rec in 2014.

Sadly, Apache XPath has languished at XPath 1.0 level: https://intellectualcramps.wordpress.com/2009/01/12/xerces-getting-xpath-2-0-support/ and http://apache-xml-project.6118.n7.nabble.com/XSLT-2-0-td20898.html

Saxon, in contrast, has supported XPath 2.0 for ages, and also supports 3.1.

In docx4j 6.1.0 we made it easy for you to try Saxon for case 1 (JaxbXmlPartXPathAware getJAXBNodesViaXPath):

Step 1: add Saxon to your classpath, for example (Maven):

<dependency>
  <groupId>net.sf.saxon</groupId>
  <artifactId>Saxon-HE</artifactId>
  <version>9.9.0-2</version>
</dependency>

Step 2: add the following early in your code:

XPathFactoryUtil.setxPathFactory(new net.sf.saxon.xpath.XPathFactoryImpl())

In docx4j 6.1.0, this only affects case 1. OpenDoPE content control data binding would still use Apache XPath.

In docx4j 8.0.0, Saxon would also be used for OpenDoPE content control data binding.

An example: date comparison

You can add an OpenDoPE conditional content control, in which the content is inserted only if XPath “xs:date(/invoice/date) > xs:date(‘2018-12-31’)” is true. (date comparison is harder in XPath 1.0: https://stackoverflow.com/questions/4347320/xpath-dates-comparison )

For this to work, you need the prefix mapping xmlns:xs=’http://www.w3.org/2001/XMLSchema’, so your XPath in the OpenDoPE XPaths path would look something like:

<xpath id="dateGt">
  <dataBinding xpath="xs:date(/invoice/date) &amp;gt; xs:date('2018-12-31')" 
 prefixMappings="xmlns:xs='http://www.w3.org/2001/XMLSchema'" 
 storeItemID="{8B049945-9DFE-4726-9DE9-CF5691E53858}"/>
</xpath>

(for now, you need to manually edit the zipped docx to add that; I’ll update the authoring tools to do it in due course)

You can try this example right away:

get a docx4j 8.0 nightly: https://docx4java.org/docx4j/docx4j-8.0.0-SNAPSHOT-20190102.jar
add Saxon (as above)
bind invoice_Saxon_XPath2.docx from https://github.com/plutext/docx4j/blob/VERSION_8_0_0/sample-docs/word/databinding/invoice_Saxon_XPath2.docx using ContentControlBindingExtensions.java

Try changing the date in invoice-data.xml to say, 2018-01-15, then observe the affect on the output docx.

Just to re-iterate, you need Saxon for this to work. Xalan’s XPath will cause an exception.

org.eclipse.wst.xml.xpath2.processor is an interesting possible alternative, but it is not in Maven Central, not as well-known as Saxon, and possibly not so easy to get support?

Posted in OpenDoPE | Comments Off on OpenDoPE and XPath 2.0/3.0

From VariableReplace to OpenDoPE data binding

April 28th, 2018 by Jason

This blog post is a walkthrough of how to easily move from variable replacement to OpenDoPE content control data binding.

Introduction

Variable replacement is quite a popular way to get started generating Word documents.

I guess that’s because developers expect this sort of approach to be available, and its easy: all you have to do is add your variables to the document, then bang, you replace them.

But its not all a bed of roses, there’s some thorns in there too:

the so-called “split run” problem, in which Word has split your variable name across more than one XML element, due to formatting, spelling/grammar etc
variable replacement is great if you just want to replace text, but what if you want to replace images, conditionally include/exclude content, or repeat table rows or list entries?

Content control data binding is a great solution to these problems.

Your data (provided in XML format) is bound to content controls using XPath, and with the OpenDoPE conventions, this approach offers:

conditional inclusion/exclusion
repeats (Word 2013 has its own repeating content controls)
nesting of repeats and conditionals
add picture/image
import escaped XHTML

Some users create very complex contracts and reports this way.

Automated Migration

The good news is that docx4j contains code to automatically migrate a document which has variables on its surface, to one which contains OpenDoPE content controls.

The code is in FromVariableReplacement.java

Have a look at the main method to see how to use it.

There have been some fixes recently, so you should use docx4j-nightly-20180428.jar (or 3.3.8 when released) or later.

OK, let’s assume you now have a docx file with content controls in it. You may want to further develop your template. For this you need an OpenDoPE Authoring tool.

OpenDoPE Authoring

The friendliest OpenDoPE authoring tool is the “No-XML” Word AddIn.

With this it is very easy to add conditions, but the limitation is that it assumes a fixed XML format. If you want to use your own XML format (or to bind escaped XHTML I suspect), you’ll need to use one of the older add ins.

Here we’ll walk through adding a condition with “No XML” add in.

For this example, we’ll use the Commonwealth of Australia’s model Confidentiality Agreement, available at http://www.business.gov.au/IPToolkit

Here’s what the first few blanks look like, represented as content controls with the “No XML” AddIn’s ribbon showing in Word:

I had pressed the “Show tags” button to be able to see the content controls in orange above.

Further down, there’s an optional Indemnity clause.

Since its optional, let’s wrap that in a conditional content control. First, we need a question “Do you want the Indemnity”. It works this way because this AddIn is aimed primarily at the interactive use case, that is, a user can answer questions in their web browser to generate an instance document.

But the resulting template can be used just as easily for the more common non-interactive / entirely programmatic case.

So click the “Insert Q/A” button. I did this with my cursor somewhere in the middle of the Indemnity clause.

Fill in the form (for Multiple Choice choose yes):

click next, then on the next page, choose type boolean (true/false), then ok.

You’ll see a content control inserted where your cursor was. We don’t really want that, so its a bit annoying (you can/should delete it). You’ll see why we did this just below.

Now select the clause heading and body, and click “Wrap with Condition”. You’ll see something like:

In the condition builder, define the following condition:

then click OK. (Now you can see why we needed to set up that question first)

Our resulting conditional clause appears as follows:

That’s all you need to do. We can now try generating an instance document from this template.

Document generation runtime

To generate a document, use docx4j code to populate an Answers object, then call Docx4J.bind. For example:

WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(new java.io.File(inputfilepath));

answers = new Answers();

addAnswer("Sponsor_name_ACNABN_oW", "CSIRO of Some St, Sydney")
addAnswer("want__Indemnity_clause_K8", "true"); // or false
// etc

Docx4J.bind(wordMLPackage, answers, Docx4J.FLAG_BIND_INSERT_XML &amp; Docx4J.FLAG_BIND_BIND_XML);
Docx4J.save(wordMLPackage, new File(outputfilepath), Docx4J.FLAG_NONE);

where addAnswer is just:

private void addAnswer(String key, String value) {
Answer a = new Answer();
a.setId(key);
a.setValue(value);
answers.getAnswerOrRepeat().add(a);
}

How do you know what key to use? Look in the answers part in the docx and use the corresponding ID (yes, you should be able to see this in the AddIn, but the reason you can’t is that for the interactive use-case, you never need to know), or you can just invoke Docx4J.bind with debug level logging enabled for org.docx4j.model.datastorage, and it will print out the relevant part.

That’s about it. If you have questions, they are probably best posted in the relevant docx4j forum or on StackOverflow.

Posted in OpenDoPE | Comments Off on From VariableReplace to OpenDoPE data binding

OpenDoPE Word Add-In source code released

August 13th, 2011 by Jason

The source code for the OpenDoPE Word Add-In developer edition is available at last at http://opendope.codeplex.com/

(A binary download has been available for 10 months or so now)

OpenDoPE stands for Open Document Processing Ecosystem; its a standards based approach to document automation / document assembly.

Fundamentally, it is a set of conventions for doing document assembly using Open XML (the ISO-standard Microsoft Word docx file format), specifically, its content control databinding architecture.

Its real attraction is that it enables users to do document production without getting locked in to some vendors’ proprietary file format:- in adopting OpenDoPE, you aren’t making any commitment above and beyond continued use of the docx file format, and a conventional approach to use of its content controls.

For further details, see the OpenDoPE website.

docx4j can combine an XML data file with an OpenDoPE docx template for you; the point of the OpenDoPE Word Add-In is to help your authors with the initial step of creating OpenDoPE docx templates.

The Word Add-In is relatively straightforward; it uses VSTO (Visual Studio Tools for Office). You’ll need Visual Studio (2010) and basic C# skills to modify it.

The point of releasing the source code is to make it easy for developers to contribute back fixes and enhancements (which has worked really well for docx4j), or extend the Addin to create their own specialised authoring tool. The source code is in Mercurial, which – because of its distributed nature – should facilitate the latter especially.

The source code for the OpenDoPE Word Add-In (developer edition) is dual licensed, the primary license being GPL v2.

The Add-In is made possible because of the availability of the SharpDevelop “Avalon” and XML editor components. Thanks guys!

Posted in OpenDoPE | Comments Off on OpenDoPE Word Add-In source code released

Archive for the ‘OpenDoPE’ Category

OpenDoPE and XPath 2.0/3.0

An example: date comparison

From VariableReplace to OpenDoPE data binding

OpenDoPE Word Add-In source code released

Subscribe

Recent Posts

Pages

Categories

Archives