Page 1 of 1

Exception comparing two docx files

PostPosted: Tue Oct 03, 2017 12:54 am
by bs_dellacqua
I need to compare 2 docx files using the Java (1.8) code in CompareDocx1.java (get from docx4j examples)

CompareDocx1.java
Java class (created starting from docx4j example) to compare 2 docx files.
(7.44 KiB) Downloaded 220 times


The used external libraries are:
External Libraries
< 1.8 >
Maven: antlr:antlr:2.7.7
Maven: com.fasterxml.jackson.core:jackson-annotations:2.7.0
Maven: com.fasterxml.jackson.core:jackson-core:2.7.3
Maven: com.fasterxml.jackson.core:jackson-databind:2.7.3
Maven: com.google.guava:guava:19.0
Maven: com.thedeanda:lorem:2.0
Maven: commons-codec:commons-codec:1.10
Maven: commons-io:commons-io:2.4
Maven: commons-logging:commons-logging:1.2
Maven: log4j:log4j:1.2.17
Maven: net.arnx:wmf2svg:0.9.8
Maven: net.engio:mbassador:1.2.4.2
Maven: org.antlr:antlr-runtime:3.5.2
Maven: org.antlr:stringtemplate:3.2.1
Maven: org.apache.avalon.framework:avalon-framework-api:4.3.1
Maven: org.apache.avalon.framework:avalon-framework-impl:4.3.1
Maven: org.apache.commons:commons-lang3:3.4
Maven: org.apache.httpcomponents:httpclient:4.5.2
Maven: org.apache.httpcomponents:httpcore:4.4.4
Maven: org.apache.xmlgraphics:batik-anim:1.8
Maven: org.apache.xmlgraphics:batik-awt-util:1.8
Maven: org.apache.xmlgraphics:batik-bridge:1.8
Maven: org.apache.xmlgraphics:batik-css:1.8
Maven: org.apache.xmlgraphics:batik-dom:1.8
Maven: org.apache.xmlgraphics:batik-ext:1.8
Maven: org.apache.xmlgraphics:batik-extension:1.8
Maven: org.apache.xmlgraphics:batik-gvt:1.8
Maven: org.apache.xmlgraphics:batik-parser:1.8
Maven: org.apache.xmlgraphics:batik-script:1.8
Maven: org.apache.xmlgraphics:batik-svg-dom:1.8
Maven: org.apache.xmlgraphics:batik-svggen:1.8
Maven: org.apache.xmlgraphics:batik-transcoder:1.8
Maven: org.apache.xmlgraphics:batik-util:1.8
Maven: org.apache.xmlgraphics:batik-xml:1.8
Maven: org.apache.xmlgraphics:fop:2.1
Maven: org.apache.xmlgraphics:xmlgraphics-commons:2.1
Maven: org.docx4j:docx4j:3.3.3
Maven: org.docx4j:docx4j-export-fo:3.3.0
Maven: org.plutext:jaxb-svg11:1.0.2
Maven: org.plutext:jaxb-xslfo:1.0.1
Maven: org.slf4j:jcl-over-slf4j:1.7.21
Maven: org.slf4j:slf4j-api:1.7.21
Maven: org.slf4j:slf4j-log4j12:1.7.21
Maven: xalan:serializer:2.7.2
Maven: xalan:xalan:2.7.2

Comparing 611_2.docx and 611_1.docx (edited with Word 2013) I have the following errors:

DIVIDE_AND_CONQUER = false
<?xml version="1.0" encoding="utf-8"?><w:body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:dfx="http://www.topologi.com/2005/Diff-X" xmlns:del="http://www.topologi.com/2005/Diff-X/Delete" xmlns:ins="http://www.topologi.com/2005/Diff-X"
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,243592]
Message: http://www.w3.org/TR/1999/REC-xml-names ... paraId&w14
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:601)
at org.docx4j.diff.Differencer.combineAdjacent(Differencer.java:1233)
at org.docx4j.diff.Differencer.diffWorker(Differencer.java:415)
at org.docx4j.diff.Differencer.diff(Differencer.java:302)
at carlo.CompareDocx1.main(CompareDocx1.java:90)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
java.lang.NullPointerException
at java.io.StringReader.<init>(StringReader.java:50)
at org.docx4j.diff.Differencer.diffWorker(Differencer.java:424)
at org.docx4j.diff.Differencer.diff(Differencer.java:302)
at carlo.CompareDocx1.main(CompareDocx1.java:90)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; End of file anticipated.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1437)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1019)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:243)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:214)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:157)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:125)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:557)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:530)
at carlo.CompareDocx1.main(CompareDocx1.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
javax.xml.bind.JAXBException: Preprocessing exception
- with linked exception:
[org.docx4j.openpackaging.exceptions.Docx4JException: Cannot perform the transformation]
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:586)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:530)
at carlo.CompareDocx1.main(CompareDocx1.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Cannot perform the transformation
at org.docx4j.XmlUtils.transform(XmlUtils.java:1357)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:581)
... 7 more
Caused by: javax.xml.transform.TransformerException: End of file anticipated.
at org.apache.xalan.transformer.TransformerImpl.fatalError(TransformerImpl.java:782)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:758)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1275)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1253)
at org.docx4j.XmlUtils.transform(XmlUtils.java:1355)
... 8 more

DIVIDE_AND_CONQUER = true
Differencing..
javax.xml.bind.JAXBException: Preprocessing exception
- with linked exception:
[org.docx4j.openpackaging.exceptions.Docx4JException: Cannot perform the transformation]
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:586)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:530)
at carlo.CompareDocx1.main(CompareDocx1.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Cannot perform the transformation
at org.docx4j.XmlUtils.transform(XmlUtils.java:1357)
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:581)
... 7 more
Caused by: javax.xml.transform.TransformerException: The prefix "w14" for the attribute "w14:paraId" associated to an element type "w:p" is not associated.
at org.apache.xalan.transformer.TransformerImpl.fatalError(TransformerImpl.java:782)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:758)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1275)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1253)
at org.docx4j.XmlUtils.transform(XmlUtils.java:1355)
... 8 more

This problem occours also with other docx files.

Does anyone have any idea how to solve the problem ?
Thanks

Re: Exception comparing two docx files

PostPosted: Tue Oct 03, 2017 6:32 pm
by jason
https://github.com/plutext/docx4j/commi ... 223f00a1c5 should help

For your input files, there are rels (references from the contents) which still need handling though. I haven't tackled that.

Re: Exception comparing two docx files

PostPosted: Tue Oct 31, 2017 4:54 am
by bs_dellacqua
We have downloaded the last available revision of Differencer.java e Docx4jDriver.java and then we have generated a new version of docx4j.jar.
If we execute the following code:

public static void main(String[] args) {
try {
String newFile = System.getProperty("user.dir") + "/diff/523_1_3.docx";
String oldFile = System.getProperty("user.dir") + "/diff/523_1_2.docx";

WordprocessingMLPackage newerPackage = WordprocessingMLPackage.load(new File(newFile));
WordprocessingMLPackage olderPackage = WordprocessingMLPackage.load(new File(oldFile));

Body newerBody = (newerPackage.getMainDocumentPart().getJaxbElement()).getBody();
Body olderBody = (olderPackage.getMainDocumentPart().getJaxbElement()).getBody();

java.io.StringWriter sw = new java.io.StringWriter();
javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(sw);
Calendar changeDate = Calendar.getInstance();

Differencer pd = null;
pd = new Differencer();

pd.diff(newerBody, olderBody, result, "someone", changeDate,
newerPackage.getMainDocumentPart().getRelationshipsPart(),
olderPackage.getMainDocumentPart().getRelationshipsPart()
);
} catch (Exception e) {
e.printStackTrace();
}
}

we have the exception

java.lang.IllegalStateException: Cannot write attribute: too late!
at com.topologi.diffx.xml.XMLWriterBase.attribute(XMLWriterBase.java:281)
at com.topologi.diffx.format.SmartXMLFormatter.delete(SmartXMLFormatter.java:184)
at com.topologi.diffx.algorithm.DiffXFitopsy.process(DiffXFitopsy.java:173)
at com.topologi.diffx.Docx4jDriver.mainDiff(Docx4jDriver.java:167)
at com.topologi.diffx.Docx4jDriver.diff(Docx4jDriver.java:401)
at org.docx4j.diff.Differencer.diffWorker(Differencer.java:319)
at org.docx4j.diff.Differencer.diff(Differencer.java:297)
at carlo.compareDocx.CompareDocxTmp.main(CompareDocxTmp.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
java.lang.NullPointerException
at org.docx4j.diff.Differencer.diffWorker(Differencer.java:329)
at org.docx4j.diff.Differencer.diff(Differencer.java:297)
at carlo.compareDocx.CompareDocxTmp.main(CompareDocxTmp.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Attached there are the 2 docx file we have compared.
Any suggestions about this problem ?
Thanks and best regards