Page 1 of 1

Howto correctly transform content controls with XML data

PostPosted: Fri Mar 02, 2012 2:40 am
by mortomanos
Hi,

I'm using .docx files with content controls with the OpenDoPE Add-In to generate Word templates that get filled from our applications. The conversion process then looks at the "template", injects a different custom XML for the data binding and regenerates the docx. This Word document is then automatically placed in an ECM system, that generates a preview version of the Word document.

And here is the problem: the preview version seems to get rendered out of the original content, not out of the content that should be there regarding the reinjected XML data, but out of the data that was originally in the XML file and therefore in the template. I cannot imagine why this happens, since when I open the document with Word, I see the correct data inserted. And when I save this file on a different location using Word "Save As...", and insert this file into the ECM system, the preview is rendered correctly.

What is wrong here? Am I missing some steps that are necessary when transforming the file? I'm using these lines of code to transform the Word document:

Code: Select all
         OpenDoPEHandler openDopeHandler = new OpenDoPEHandler(wordMLPackage);
         openDopeHandler.preprocess();
         OpenDoPEIntegrity openDopeIntegrity = new OpenDoPEIntegrity();
         openDopeIntegrity.process(wordMLPackage);


Any help is appreciated!

Regards,
Michael

Re: Howto correctly transform content controls with XML data

PostPosted: Fri Mar 02, 2012 11:17 am
by jason
The reason it looks right when you open the docx in Word, is that Word is applying the bindings for you.

If you want docx4j to do that, you need to do something like:

Code: Select all
      // Apply the bindings
      //BindingHandler.setHyperlinkStyle("Hyperlink");
      BindingHandler.applyBindings(wordMLPackage);


or (if all your content controls are in your main document part (as opposed to headers/footers etc), just:

Code: Select all
      BindingHandler.applyBindings(wordMLPackage.getMainDocumentPart());


Do this after your existing code. See the ContentControlBindingExtensions sample for further details, and the note about processing hyperlinks (to the effect that if you are processing hyperlinks, you'll need to strip the content controls, using RemovalHandler).

hope this helps .. Jason

Re: Howto correctly transform content controls with XML data

PostPosted: Mon Mar 05, 2012 11:24 pm
by mortomanos
Hi Jason,

thanks for your quick reply, this solved my problem. Now for the next task: merge two documents that are generated from the same template with different content together, so that the main document parts from doc B are added to the main document part in Doc A. Header and footer shall not be merged.
Is there a way to do this? Shall I open another thread?

Thanks and best regards,
Michael

Re: Howto correctly transform content controls with XML data

PostPosted: Tue Mar 06, 2012 1:10 am
by jason
Hi Michael

You can merge your main document parts easily using docx4j, provided the content is just plain text and tables. Things will also be fine if the second document uses styles and numbering, since in your case these are defined in the first document.

However, if your doc B contain hyperlinks, images, or anything else which is based on an explicit relationship to another part, you will need to handle these (this includes the case where you have sections in your docx, and the sections include header/footer relationships).

Plutext offers a commercial extension called MergeDocx which handles this stuff for you; see http://www.docx4java.org/blog/2010/11/m ... documents/ You can contact me off list if you are interested.

cheers .. Jason

Re: Howto correctly transform content controls with XML data

PostPosted: Tue Mar 06, 2012 4:13 am
by mortomanos
Hi Jason,

the two documents are both generated with the above code (using content controls and XML bindings). Then they should be merged, but what I don't know yet is how this behaves correctly, because I don't think the XML from doc B will be transferred automatically to the customXML folder in doc A (using a different filename, like let's say item27.xml). And there may be other problems, since the generated XML will be the same structure, so the XPath may not be unique any more.
I hope you see my points?

Michael

Re: Howto correctly transform content controls with XML data

PostPosted: Tue Mar 06, 2012 9:23 am
by jason
If you strip the content controls using RemovalHandler (after using docx4j to perform the binding), then the XPaths are gone and the customXML becomes irrelevant.

Users can of course edit the docx, but since the content controls will be gone. their changes won't be reflected in your custom xml part.

If on the other hand, you want users to be able to edit the docx and have their changes reflected in the custom xml, there are 2 approaches.

The first and simplest is to ensure that the custom xml part in each docx has a unique storeItemID; if this is the case, copy the custom xml part across and things should just work (the XPaths can be identical, it is the associated storeItemID identifies the document to XPath addresses).

The second, which I include for the sake of completeness, is to merge the 2 custom xml parts, and alter the XPaths in the content controls accordingly. More work, and no point?

Re: Howto correctly transform content controls with XML data

PostPosted: Fri Mar 09, 2012 8:05 pm
by mortomanos
Hi Jason,

thanks for your suggestions. I will go the RemovalHandler route, since it is absolutely not necessary to reflect the users' changes in the XML parts. In my case, the XML is only for document generation via templates, not for a "way back". I'll implement it and will update the thread with my results.

Thanks for your great support!
Michael

Re: Howto correctly transform content controls with XML data

PostPosted: Fri Mar 09, 2012 10:28 pm
by mortomanos
Hmm, something is not quite clear to me.

This is what I tried:

Code: Select all
RemovalHandler removalHandler = new RemovalHandler();
removalHandler.removeSDTs(wordMLPackageDocA, Quantifier.ALL);
removalHandler.removeSDTs(wordMLPackageDocB, Quantifier.ALL);
wordMLPackageDocA.getMainDocumentPart().addTargetPart(wordMLPackageDocB.getMainDocumentPart());


But the result was: headers / footers from Doc A, with only the content from Doc B. What am I doing wrong?

Re: Howto correctly transform content controls with XML data

PostPosted: Sat Mar 10, 2012 4:16 pm
by jason
You shouldn't do

Code: Select all
wordMLPackageDocA.getMainDocumentPart().addTargetPart(wordMLPackageDocB.getMainDocumentPart());


That is adding the main document part of doc B as a rel of doc A's.

You probably want:

Code: Select all
wordMLPackageDocA.getMainDocumentPart().getContent().addAll(wordMLPackageDocB.getMainDocumentPart().getContent());


Remember, something that simple will only work if docB's main document part has no rels (hyperlinks, images, footnotes, etc)

Re: Howto correctly transform content controls with XML data

PostPosted: Mon Mar 12, 2012 11:46 pm
by mortomanos
After implementing your suggestion, the solution looks sufficient. Thanks for your help so far.

Re: Howto correctly transform content controls with XML data

PostPosted: Tue Mar 13, 2012 11:01 pm
by mortomanos
Very interesting. After a OS change to Win7 x64 with O2k10 something strange happened:

My document template (saved with O2k7) works correctly:

Code: Select all
Header DocA Header DocA Header DocA Header DocA Header DocA
-----------------------------------------------------------

Paragraph DocA Paragraph DocA Paragraph DocA Paragraph DocA
Paragraph DocA Paragraph DocA Paragraph DocA Paragraph DocA

Image DocA Image DocA Image DocA Image DocA Image DocA Imag

Paragraph DocB Paragraph DocB Paragraph DocB Paragraph DocB
Paragraph DocB Paragraph DocB Paragraph DocB Paragraph DocB

Image DocB Image DocB Image DocB Image DocB Image DocB Imag

-----------------------------------------------------------
Footer DocA Footer DocA Footer DocA Footer DocA Footer DocA


The same with a template saved in O2k10:

Code: Select all
Paragraph DocA Paragraph DocA Paragraph DocA Paragraph DocA
Paragraph DocA Paragraph DocA Paragraph DocA Paragraph DocA

Image DocA Image DocA Image DocA Image DocA Image DocA Imag

Paragraph DocB Paragraph DocB Paragraph DocB Paragraph DocB
Paragraph DocB Paragraph DocB Paragraph DocB Paragraph DocB

Image DocB Image DocB Image DocB Image DocB Image DocB Imag


The header from DocA is missing. Is this possible? Is there something missing in the docx4j base, that handles the slightly different O2k10 OpenXML implementation differently? The only explanation I have is that I use content controls in the header area, and something changed between O2k7 and O2k10 regarding the content controls.

Re: Howto correctly transform content controls with XML data

PostPosted: Wed Mar 14, 2012 12:55 am
by jason
docx4j shouldn't be doing anything different with your docx.

It is more likely that you removed the document level sectPr element (or its header/footerRef content) somehow (check towards the end of document.xml).

You could compare the main document part (document.xml ) of the 2 documents, and inspect the header parts.