Page 1 of 1

docx4j crashes while comparing documents

PostPosted: Tue May 29, 2012 10:53 am
by tvn
Hi!

I have try to use the latest release of docx4j (2.8.0)
I want to compare 2 documents to see the difference.
I have try to run an example for document comparison and while running several tests I found docx4j "crashes" while comparing of document. (some docs are compared ok but some of them produce an errors)
The documents itself is rather simple, there are some text, one table and that's it.

The stack trace of error:

java.lang.IllegalArgumentException: Cannot write attribute: too late!
at com.topologi.diffx.xml.NSAwareXMLWriter.attribute(NSAwareXMLWriter.java:484)
at com.topologi.diffx.format.SmartXMLFormatter.delete(SmartXMLFormatter.java:282)
at com.topologi.diffx.algorithm.DiffXFitopsy.process(DiffXFitopsy.java:272)
at com.topologi.diffx.Main.diff(Main.java:323)
at com.topologi.diffx.Docx4jDriver.diff(Docx4jDriver.java:327)
at org.docx4j.diff.Differencer.diffWorker(Differencer.java:320)
at org.docx4j.diff.Differencer.diff(Differencer.java:298)
at CompareDocuments.main(CompareDocuments.java:74)
java.lang.NullPointerException
at org.docx4j.diff.Differencer.diffWorker(Differencer.java:377)
at org.docx4j.diff.Differencer.diff(Differencer.java:298)
at CompareDocuments.main(CompareDocuments.java:74)

Can anybode help to reolve of this issue?

If it's necessery I can send the sample docx files I have used, please let me know the email.

Thanks a lot in advance!

Re: docx4j crashes while comparing documents

PostPosted: Tue May 29, 2012 11:06 am
by tvn
I try to check what happens if manually set value isNude to true to avoid an error and check the result XML generated
Below is fragment of this XML

it looks like docx4j construct XML incorrect for some reason:
<w:t xml:space="preserve">del:rsidP="002C1C18" del:rsidR="00426AF4" del:rsidRDefault="00426AF4" del:rsidRPr="002C1C18"&gt;

I just don't understand how correct XML output should look for delete events in this case.

<w:ins xmlns:xalan="http://xml.apache.org/xalan" xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage" w:date="2009-03-11T17:57:00Z" w:author="someone" w:id="11">
<w:r>
<w:rPr>
<w:rFonts w:ascii="Arial" w:cs="Arial" w:eastAsia="Times New Roman" w:hAnsi="Arial"/>
<w:color w:val="000000"/>
<w:sz w:val="19"/>
<w:szCs w:val="19"/>
</w:rPr>
<w:t xml:space="preserve">Modified 5</w:t>
</w:r>
</w:ins>
<w:r xmlns:xalan="http://xml.apache.org/xalan" xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">
<w:t xml:space="preserve"> del:rsidP="002C1C18" del:rsidR="00426AF4" del:rsidRDefault="00426AF4" del:rsidRPr="002C1C18"&gt;</w:t>
</w:r>

Re: docx4j crashes while comparing documents

PostPosted: Wed May 30, 2012 2:46 pm
by jason
You may as well attach the documents causing the problems, but as to fixing it, you're on your own for the moment, I'm afraid.

If you can't get it working, here are 2 other diff projects you might try integrating with docx4j, in place of com.topologi.diffx:

http://code.google.com/p/fc-xmldiff/

https://github.com/tanob/jxydiff

Please let us know how you go...

Re: docx4j crashes while comparing documents

PostPosted: Wed May 30, 2012 6:17 pm
by tvn
Hi Jason,

thank you for reply!

I will try to fix it or try to use diffx alternative...

The only question I have, can you please explain how such kind of entries should look to make output XML correct?

<w:t xml:space="preserve">del:rsidP="002C1C18" del:rsidR="00426AF4" del:rsidRDefault="00426AF4" del:rsidRPr="002C1C18"&gt;

thanks a lot!

Re: docx4j crashes while comparing documents

PostPosted: Wed May 30, 2012 6:37 pm
by jason
Those things are diffx attributes; it should probably be producing:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
<w:t xml:space="preserve" del:rsidP="002C1C18" del:rsidR="00426AF4" del:rsidRDefault="00426AF4" del:rsidRPr="002C1C18">
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


They need to be post processed in order to get valid WordML