ok first, docx4j deals mainly with docx documents, so if one of the documents you want to compare is an old binary .doc, you will need to convert it to docx first. There is proof of concept code for doing this in org.docx4j.convert.in, but that code is far from fully featured. Your best bet for doing this might be the b2xtranslator project (on sourceforge). That is C# ... it'd be good to port it to Java.
So now, assuming you are comparing 2 docx, we have code for doing a compare of 2 paragraphs (or 2 sdt's) in the org.docx4j.diff package. This works pretty well, and produces a result which has tracked changes.
Whilst you could try using the underlying library to compare two main document parts, ymmv. I think you would be better off using LCS or some similar to find which paragraphs correspond, and then use org.docx4j.diff on those. The source of org.eclipse.compare has been sitting (unused afaik) in the docx4j source tree for a while now; you might try starting with that.
Finally, to produce a valid resulting document, you need to ensure ids point to the correct relationship (eg for images, hyperlinks), and styles are defined etc. A
good explanation of what you need to do can be found athttp://blogs.msdn.com/ericwhite/archive ... l-sdk.aspx
cheers .. Jason