Page 1 of 1

How to check if documents are the same

PostPosted: Tue Mar 12, 2013 4:45 am
by slowikps
I saw an example which compare two documents, but I can't find easy way to say if two documents are the same.
How can I determine this (I don't care about details I only want to know if documents are the same)?

Thanks,
Pawel

Re: How to check if documents are the same

PostPosted: Tue Mar 12, 2013 6:23 pm
by jason
It depends on your notion of "equality".

Are they the same if they contain the same text, ignoring formatting? This is easy to compute.

Or are they the same if they have the same text and formatting? This is harder, especially if you want to say that <w:r><w:t>The quick brown fox</w:t></w:r> is the same as <w:r><w:t>The quick/w:t></w:r><w:r><w:t> brown fox</w:t></w:r> and <w:r><w:t>The quick/w:t><w:t> brown fox</w:t></w:r>

Context can help here. Are you comparing documents created in Word, and not subsequently saved using something else (including for example, docx4j)?

Re: How to check if documents are the same

PostPosted: Tue Mar 12, 2013 7:58 pm
by slowikps
Hi Jason,
Thanks for answer. I have template document (created using OpenDoPE), then I am binding template with xml data. I need equals method to write a unit test for this.
After binding, I want to compere result with “correct output document” to check if they are the same. Think, that text comparing ignoring formatting should be enough for me.
Thanks,
Pawel

Re: How to check if documents are the same

PostPosted: Tue Mar 12, 2013 9:46 pm
by jason
In that case you can use org.docx4j.TextUtils

Re: How to check if documents are the same

PostPosted: Tue Mar 12, 2013 10:16 pm
by slowikps
thanks!