Page 1 of 1

Merging documents causes custom xml doubling

PostPosted: Mon May 09, 2011 4:40 am
by Peter.BY
I'm trying to merge complex document with altChunk inside into single document. Main document has custom xml file embedded into it. After running the merge procedure as following:

Code: Select all
           
            wordMLPackage = WordprocessingMLPackage.load(new java.io.File(outputfilepath));
            wordMLPackage = ProcessAltChunk.process(wordMLPackage);
            SaveToZipFile saver = new SaveToZipFile(wordMLPackage);
       saver.save(finaloutpath);


I get two custom xml files with identical content. Is it a bug or somewhere defined behaviour?

Re: Merging documents causes custom xml doubling

PostPosted: Mon May 09, 2011 10:44 am
by jason
There are 5 so-called "Well Defined Custom XML Parts" in OpenXML. The merged docx will contain at most one coverPageProperties part - the first one we encounter. The other 4 parts are dropped.

Regarding other custom xml parts, these will be dropped if they do not have a CustomXmlDataStoragePropertiesPart.

Regarding the OpenDoPE custom xml parts, these will be merged. ie if there are 2 ConditionsParts, they will get merged to form one.

All other custom xml parts will be copied. This ensures that you can have data bindings in two documents, and they ought to keep the correct data. For example, 2 invoices each specifying a customer name via custom xml data binding; after the merge, the customer names should still be correct.

In your case, your altChunk Part2.docx does not contain any custom xml, so I'll have to look to see where the duplicate copy is coming from.

Re: Merging documents causes custom xml doubling

PostPosted: Mon May 09, 2011 11:36 am
by Peter.BY
Thank you for your response and explanation. Not sure I've got clear everything you wrote, looks like I have to get deeper into OpenXML spec and how docx4j works :)
My scenario, I'm trying to implement, is following: there is a number of docx templates that may have content controls. Template with content control has an empty custom XML that is required to properly setup data binding manually in Content Control Toolkit. Later it is filled programmatically with actual data (this part works OK). Also each template may include others via altChunk in a tree manner.
For now I use special content control (with path to template to include in a tag attribute) which later is replaced with altChunk by modifying JAXB tree. Before merging I save the document onto a disk to avoid any possible side effects (though, haven't found any).
I'm aware that there is a od:component element that makes this work easier, but for now I stay with standard schema. I expect that probably each template will have its own custom xml, exactly as in the case with two invoices you wrote about. Just want to ensure that doubling will not occure if included document has no custom XML for data binding, as far as there could be a number of them this can lead to excessive information placed into the final document.

Re: Merging documents causes custom xml doubling

PostPosted: Tue May 10, 2011 1:19 am
by jason
OK, that issue is addressed. It was occurring where the 1 docx was being merged twice.

This is effectively what happens when you have an altChunk, since in
that case MergeDocx merges 3 things:

- the contents of the docx up to where the altChunk occurs
- the contents of the altChunk
- the contents of the docx which occurs after the altChunk