Page 1 of 1

Enhancing more complicated xpath expressions

PostPosted: Fri Jun 17, 2011 3:07 am
by tinne
Given a document with the following properties:

There is a repeat tag, say, referencing the xpath /document/section. There are nested conditionals referencing nested elements, say, /document/section/paragraph/run.

Using OpenDoPEHandler, these nested conditionals are enhanced whenever the handler considers them nested, that is, they start with /document/section. This has obviously been thoroughly tested with nested repeats, as all sorts of /document/section/paragraph[7]/run with or without fixed indexes are property enhanced (for /document/section[index]/paragraph[7]/run).

Then again, tests could be like "count(/document/section[index]/paragraph[7])>0 and string(/document/section[index]/nutrition)='wisdom'", and the handler fails.

In principle, this would need an xpath 1.0 parser to fully handle. I found one at an antlr blog, but maybe better alternatives exist.

For the moment, I did some refactoring to handle just the cases noted above: and-clauses and function calls on path expressions, see attachment.
OpenDoPEHandler.patch.txt
Additions to support a few more nested conditions
(8.99 KiB) Downloaded 272 times


The attachment is a bit clumsy, as it contains logging fixes commented in "log4j; JAXB "RI"??", just scroll to the bottom for the important parts.

Any suggestions how to generalize this (using the above grammar or otherwise)?

Re: Enhancing more complicated xpath expressions

PostPosted: Mon Jun 20, 2011 5:44 am
by tinne
Realistically, this could not stay this way. I took the Jan-Willem van den Broeks grammar and built it into a rewriting parser that enhances xpath expressions just the way that is needed. Thus, all xpath 1.0 expressions can be used.

In order to do this, antlr and stringtemplate need to be added to the dependencies. I fixed the POM and didn't touch the ant build file. Guess it is an easy exercise.

A few words on the method: all absolute path expressions are found and filtered whenever they fit the prefix given. Comparison is still made based on string comparison, i.e., whitespace matters and there could be ill-rewritings /some/greater/path -> /some/great[17]er/path which would require comparing the syntax tree instead of the token stream. Then again, we should know our docx template files.

docx4j-2.7.0-SNAPSHOT-XPathEnhancer.patch.zip
Grammar based XPath enhancement for the OpenDoPE handler
(22.68 KiB) Downloaded 224 times

Re: Enhancing more complicated xpath expressions

PostPosted: Mon Jun 20, 2011 9:49 am
by jason
Nice work Tinne; thanks very much for this :-)

Applied as http://dev.plutext.org/trac/docx4j/changeset/1547

It would be nice to have some JUnit tests on this, but I am as guilty as anyone in not writing enough tests :-(

Re: Enhancing more complicated xpath expressions

PostPosted: Mon Jun 20, 2011 9:28 pm
by tinne
I wrote some, actually, and originally did not copy them, since my docx4j-trunk project runs on a machine still on Eclipse Galileo, that is, no ANTLR IDE.
You may wish to adjust the package declaration... (oops, in XPathEnhancer.g as well...)
xpathextend-tests.zip
A few lines of JUnit to test the XPath-Enhancer
(1.59 KiB) Downloaded 211 times

Re: Enhancing more complicated xpath expressions

PostPosted: Wed Jun 22, 2011 2:34 pm
by jason
Thanks for these Tinne; see http://dev.plutext.org/trac/docx4j/changeset/1558 and 1557.

Re: Enhancing more complicated xpath expressions

PostPosted: Wed Jun 22, 2011 11:58 pm
by tinne
Unfortunately, now the maven build fails, as it runs TestConfiguration as a junit test class in its own right. Referring to the maven documentation, these classes are interpreted as tests:

<includes>
<include>**/Test*.java</include>
<include>**/*Test.java</include>
<include>**/*TestCase.java</include>
</includes>

cf. http://maven.apache.org/plugins/maven-s ... l#includes

Hence renaming TestConfiguration to XPathConfiguration (and ImageTypeBmp to ImageTypeBmpTest) fixes the build.

Re: Enhancing more complicated xpath expressions

PostPosted: Thu Jun 23, 2011 1:04 am
by jason
Ah yes, the Maven build.

Thanks for that; done as http://dev.plutext.org/trac/docx4j/changeset/1561