Page 1 of 1

Merge docx strategies

PostPosted: Thu Sep 30, 2010 12:48 am
by Nian
Hi, I got some questions about docx4j and openxml file format, could any one give me some advice?

1. Could I merge multi docx files? (not by altchunk)
For some reasons, I have to create a huge docx file.
In order to enhance the efficiency, I would like to create many subsets of the huge file by multi-threading.
(ex. by chapters, and all of them are docx files.)
then, merge these small subsets into one huge docx file.

I tried using altchunk, but it's not exactly what I want.
And there are all .net approaches for merging docx files I found so far.

Appending all the document parts of subsets together and solve the relationship problems take lots of efforts to make it work, is there any other good idea?

2. Could docx file formats have many "document.xml" (i.e. main document part)?
In order to solve the problem 1, I had tried to generate multi document.xml in the zipped docx structure.
I also use DTD and some other approaches to link those document.xml,
(ie. imports other xml parts into the main document.xml ), but it seems don't work in Word.
Is there any other approach could make it or solve the problem 1?

[EDIT 3. moved to separate topic]

Sorry for the poor English,
and any advice would be appreciated.

Re: Merge docx strategies

PostPosted: Thu Sep 30, 2010 9:10 am
by jason
I'll have a think about it some more, but my immediate reaction is that merging all the content into the one document.xml is the best approach.

docx4j doesn't contain code to do this for you automatically, right now. But its on my list of things to do (things to do unless someone contributes the code first).

It wouldn't be that hard to write, especially if you could constrain the task by assuming that the only things with relationships were say, images and hyperlinks; or that the styles were the same across documents. Depends on how predictable your input documents are.

May I ask why altChunk doesn't work for you? Is it because you want to continue processing in docx4j (as opposed to opening the docx in Word, and letting it process the altChunk elements)?

If I wrote the merging code, it would also handle a altChunk containing WordML.

Re: Merge docx strategies

PostPosted: Thu Sep 30, 2010 10:28 pm
by Nian
Hi, Jason!

I just want to know if there is any method could merge files without using altchunk.
Because there might be other requirements to process the merged one.

If there is not, I'll consider to use altchunk.

Thanks for your quick response,
there are a few developers could do this!!

Thanks a lot!!

Re: Merge docx strategies

PostPosted: Sun Nov 14, 2010 8:03 pm
by jason
I've created a paid extension for docx4j which can merge docx properly.

See http://dev.plutext.org/blog/2010/11/mer ... documents/ for details.

You can buy it at www.plutext.com. Purchases of this extension support the further development of docx4j.