Page 1 of 1

Docx comparison without merging.

PostPosted: Mon Jun 25, 2018 3:27 pm
by parvathy.pb
Hi All,
I want to compare two docx files and get the output file with highlighted the changes of new file from old file in text as well as in style formating .Presently i am using following sample code for comparision https://github.com/plutext/docx4j/blob/ ... ments.java but it doesnt compare the stylings like bold ,italics or underline.Can anyone help me to figure out the issue please.
Thanks.

Re: Docx comparison without merging.

PostPosted: Mon Jun 25, 2018 5:46 pm
by jason
You'll need to provide more detail including the 2 input documents you are comparing.

Re: Docx comparison without merging.

PostPosted: Mon Jun 25, 2018 6:02 pm
by parvathy.pb
I have attached the sample1.docx which is my old file and sample2.docx which is newer one.Kindly note that sample1 has a bold para which is not bold in sample2.But this change in style in not shown as difference.Kindly suggest me ways to also compare the styling ?
Thanks.

Re: Docx comparison without merging.

PostPosted: Mon Jun 25, 2018 7:17 pm
by jason
If diffx is capturing style differences (which you'll need to confirm), then those differences are being dropped by diffx2wml.xslt

You could modify diffx2wml.xslt to meet your use case. I personally find the tracking of formatting differences to be annoying and distracting, so I wouldn't be surprised I did it this way :-)

Note that the differencing typically only compares document content (eg stuff in the Main Document Part), not the styles parts, so unless you did that as well, you'd only be picking up differences in directly applied formatting.

Re: Docx comparison without merging.

PostPosted: Mon Jun 25, 2018 8:05 pm
by parvathy.pb
Thanks .But could you please suggest me way to find the style difference also?Is there any java library available to do this?

Re: Docx comparison without merging.

PostPosted: Tue Jun 26, 2018 10:19 am
by jason
I don't know about other libraries, sorry, it's been years since I researched this. My findings at the time are documented in the code.

You might be able to achieve what you want with diffx (ie in docx4j). You can compare the styles parts. You could even compare a Flat OPC representation of each document.

It depends what you want to achieve, and what you can assume about the input documents (ie how they were created/edited).

For example, in the main document part, order matters (obviously). In the list of styles in the styles part, it doesn't (mostly anyway!). So in the styles part, you care about the contents of Heading 1, not its position in the list.