Page 1 of 1

Split lagre Docx file into multiple small Docx files

PostPosted: Tue Apr 02, 2019 6:17 pm
by purushotham
Please need me help.

Actually I am converting large docx into html using Docx4J which is taking so much time.

So can I split large docx file into multiple small docx files?

Re: Split lagre Docx file into multiple small Docx files

PostPosted: Tue Apr 02, 2019 6:53 pm
by jason
How many pages?

The high tech approach would be to use MergeDocx, part of the commercial Docx4j Enterprise Edition.

The low tech approach is simply to make a copies of the docx, containing different "chunks" of the MainDocumentPart's content list.

With the low tech approach each of the small docx files would contain all the images etc (ie include those not actually used in the "chunk"), but this probably doesn't matter for your application.

There are some issues to consider when splitting a document up, for example:

- does it contain sections which inherit from one another? you probably don't care if you are creating HTML.

- hyperlinks/cross references between chunks

Re: Split lagre Docx file into multiple small Docx files

PostPosted: Tue Apr 02, 2019 8:08 pm
by purushotham
I am giving 200 pages docx file. It is taking nearly 3minutes.

Re: Split lagre Docx file into multiple small Docx files

PostPosted: Tue Apr 02, 2019 8:36 pm
by jason
Uploading to where? What CPU? What -Xmx? You could try to understand where the time is being consumed.

There is also Docx4J.FLAG_EXPORT_PREFER_NONXSL which is quicker, but missing some features. See https://github.com/plutext/docx4j/blob/ ... .java#L155

Re: Split lagre Docx file into multiple small Docx files

PostPosted: Tue Apr 02, 2019 9:01 pm
by purushotham
Very much thanks a lot for giving me suggestion.

Performance issue because of my CPU.

And that Docx4J.FLAG_EXPORT_PREFER_NONXSL is also working fastly.