Page 1 of 1

Roundtripping and consolidation

PostPosted: Wed Jun 29, 2011 2:59 am
by tinne
We have been discussing roundtripped document binding before, and I did some work on it. If you think it's worth, I'd happily suggest another release candidate.

In the moment, a repeat SDT is multiplied into several SDTs stripped off, all these SDTs not tagged any more. I changed this into the first SDT unchanged, the following denoted by an empty repeat-tag "od:repeat=". On a second run on this document, the first SDT is then re-used for the re-created recursion (the trailing [1] being removed from the xpath), the others are removed. Thus, roundtripping repeats works in case no repeat ever encounters an empty nodeset, which would remove the whole document part as before.

The case zero repeats matches the condition false, so these two missing cases can be fixed in a later addition, e.g., by turning text into deleted text or vice versa. This is not jet implemented.

Then, I've created a RemovalHandler, which can remove SDTs of a certain tag type from a document. It is similar to the BindingHandler in implementation, tackling one part at a time. You tell it to remove some subset of od:condition, od:repeat, and od:xpath sdts, and it replaces all sdts found with their content. The default behavior is to remove conditions and repeats, leading to a document with only simple field bindings, but you can well run the BindingHandler on the part before and remove every SDT in order to get a plain document.

In result of the removal, formatting applied to the SDT as a whole is lost, as I found no easy way to replicate the content of the sdtEndPr into the sdtContent nodes. Simply copying to each rPr/pPr/tcPr etc. would not do, as overrides applying to those required might require partial omission. So, if anyone got an idea, ...

docx4j-2.7.0-SNAPSHOT-roundtrip1.patch.zip
The attached patch contains two minor fixes to the maven build and the Context, also, I've fixed those parts of XmlUtils which the new code use not to throw java.lang.Exception any more.
(5.7 KiB) Downloaded 311 times

Re: Roundtripping and consolidation

PostPosted: Thu Jun 30, 2011 3:51 am
by tinne
There was a problem in OpenDoPEHandler regarding the binding of empty table cells. The handler yet contained code to handle case 1:
Code: Select all
<w:sdt>
  <sdtPr><w:tag value="od:condition=..."/></sdtPr>
  <sdtContent><w:tc>...</w:tc></sdtContent>
</w:sdt>

In this case, when the condition is false, the handler replaces it with
Code: Select all
<w:tr><w:p/></w:tr>

The handler was broken regarding case 2:
Code: Select all
<w:tc>
  <w:sdt>
    <sdtPr><w:tag value="od:condition=..."/></sdtPr>
    <sdtContent>...</sdtContent>
  </w:sdt>
</w:tc>

In this case, the handler removed the inner SDT, leaving us (illegally) with a <w:tc/> containing no content (in case there were no other sibling children to the SDT node).

I added support for the second case, adding an empty paragraph whenever the parent of a removed SDT is a tc and there are no content siblings.

Going back to case 1, I cannot quite understand why one should always want table cells to be swept clean on a negative condition. What if I want to remove a column based on a condition? I could only replicate the table with or without the column and choose globally, quite an act if there is more than one such column. Thus, I added a global option to support optionally removing entire cells based on a condition. Note that in case of global column width configuration, this could lead to geometry issues. Thus, the current version of the patch mainly works with dynamic width tables.
docx4j-2.7.0-SNAPSHOT-roundtrip2.patch.zip
The code for these further changes is contained in this patch, replacing(!) the version above.
(7.03 KiB) Downloaded 302 times

Re: Roundtripping and consolidation

PostPosted: Thu Jun 30, 2011 3:10 pm
by jason
Hi Tinne

Thanks for these; I've applied it as http://dev.plutext.org/trac/docx4j/changeset/1579 (except the pom.xml change - see my post in parent forum about that)

I think we should put this in 2.7.0; not sure whether another rc is necessary though .. we can wait a bit to see whether anything else crops up.

tinne wrote:In result of the removal, formatting applied to the SDT as a whole is lost, as I found no easy way to replicate the content of the sdtEndPr into the sdtContent nodes. Simply copying to each rPr/pPr/tcPr etc. would not do, as overrides applying to those required might require partial omission.


I looked at its description at http://msdn.microsoft.com/en-us/library ... rties.aspx and played around with it in Word 2007. I don't think we need to worry about it at all, since this tag doesn't seem to affect the formatting of the contents of the sdt. The most I could get it to do in Word 2007: when I positioned my cursor immediately before the magic sdtendchar and typed something, the first thing I typed caused the final paragraph character to appear in that format (the letter i typed appeared *after* the sdt); following keystrokes appeared in the format, but again, outside the sdt. This testing was on a rich text para level sdt.

tinne wrote: I added a global option to support optionally removing entire cells based on a condition.


I can image a user wanting this to be set differently document to document, or even table to table within a document.

So I'm wondering how we could best change the convention to allow this.

One option would be to have it as an optional setting in <w:tag value="od:condition=..."/>

Perhaps <w:tag value="od:condition=...&removeIfEmpty=true"/>

Calling it "removeIfEmpty" might open the way to similar semantics for other empty objects (rows? text boxes?).

If you wanted to use this approach to get rid of a column, it might be a bit of a pain putting this property on each tc.

So alternatively, we could put the whole table in an sdt, and use that sdt to provide a table-wide setting for this property. I don't think I'd do this unless there were other table level properties we wanted to be able to alter. (Maybe we could provide a way of doing column oriented processing at this level??)

If we can work something out here, I'd like to do this for 2.7.0.

Finally, I will try to find time to write a couple of unit tests for these changes (unless you have some already?).

thanks again .. Jason

Re: Roundtripping and consolidation

PostPosted: Thu Jun 30, 2011 8:25 pm
by tinne
I'm afraid I have no ready-made unit-tests at the moment. The whole point is that OpenDoPEHandler is not only not thread-safe, but mostly untestable. I'd like to break it down into smaller testable parts and make it work without static members, but I'm not sure when I find time for this.

I can image a user wanting this to be set differently document to document, or even table to table within a document.

So I'm wondering how we could best change the convention to allow this.


I doubt that this is necessary at all, because a user can always use the w:tc/w:sdt/w:p approach if she wishes to retain the cell on a condition based SDT removal. Thus, in effect, only removeIfEmpty=true is necessary at all, the global flag allowing for backward compatibility only (if you built on trunk or nightly, you're documents won't change without notice).

(regarding w:sdtEndPr/>) I don't think we need to worry about it at all, since this tag doesn't seem to affect the formatting of the contents of the sdt.

As far as I can see, if a document is under your control, you can always cope without stdEndPrs by applying these properties to all runs in stdContent. They are a bit like cascading stylesheets in word. This should just stand there as a caveat that in case someone uses them, a processed document will behave differently from the original as he might expect.

Re: Roundtripping and consolidation

PostPosted: Fri Jul 01, 2011 10:35 am
by tinne
Missing the unit tests as well. Well, found out that there is case 2b:

Given an w:tc/w:sdt[contains(@w:tag, 'od:repeat=')] and the repeat results in an empty result set. Damn, there is the empty cell again. Time for another refactoring, now introducing "eventually empty lists", which are empty lists whenever this does not produce an w:tc without content. Today, I sometimes thought maybe a fill in the table cells postprocessor would have done better. Anyway, problem solved, see attached.

Re: Roundtripping and consolidation

PostPosted: Sun Jul 03, 2011 4:06 pm
by jason
Applied - see http://dev.plutext.org/trac/docx4j/changeset/1583

btw, for a week or 2 now, you should have been able to attach a .patch text file - no need to zip up if things are working properly.

Re: Roundtripping and consolidation

PostPosted: Sun Jul 03, 2011 8:14 pm
by jason
tinne wrote:I'm afraid I have no ready-made unit-tests at the moment. The whole point is that OpenDoPEHandler is not only not thread-safe, but mostly untestable.


See now http://dev.plutext.org/trac/docx4j/brow ... dTest.java for a couple of basic tests.

Not really unit tests - more integration level - but useful for checking things are working as expected.

This basic strategy of processing a known input docx, then checking the output is as expected (which I do via XPath), ought to be able to be used for the more complex cases as well, and with any luck will still be able to be used after a future refactoring (eg to fix the thread safety issue).