Plutext

Posted: **Fri Feb 21, 2014 9:47 pm**

Hi,

When I run this code:

Code: Select all: WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage.load(new File("c:\\document.docx")); System.out.println(XmlUtils.marshaltoString(wordprocessingMLPackage.getMainDocumentPart().getJaxbElement(), true, true));

throw an exception

Code: Select all: INFO [org.docx4j.utils.XPathFactoryUtil:22] - xpath implementation: org.apache.xpath.jaxp.XPathFactoryImpl INFO [org.docx4j.openpackaging.io3.Load3:180] - package read; elapsed time: 4635 ms INFO [org.docx4j.openpackaging.parts.JaxbXmlPart:129] - Lazily unmarshalling /word/document.xml INFO [org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware:299] - For org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart, unmarshall via binder java.lang.NumberFormatException: For input string: "10206.0" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:458) at java.math.BigInteger.<init>(BigInteger.java:316) at java.math.BigInteger.<init>(BigInteger.java:451) at com.sun.xml.bind.DatatypeConverterImpl._parseInteger(DatatypeConverterImpl.java:88)

I view the document.xml and really somewhere there is

Code: Select all: ... <w:tblW w:w="10206.0" w:type="dxa"/> ...

I can conclude that the cause of error is the 10206.0 value of <w:tblW w:w and the docx4j expect an integer instead of a bigdecimal

I tested with docx4j-3.0.1 bu problem is the same

Note: the document.docx was created in the Google Docs and downloaded as docx

Attached is the document.docx

Many Thanks

Regards
Sérgio

Posted: **Fri Feb 21, 2014 11:25 pm**

A quick look suggests Google Docs should be writing an integer.

Syntax: [ Download ] [ Hide ]

Using xml Syntax Highlighting

<xsd:simpleType name="ST_DecimalNumber">
<xsd:annotation>
<xsd:documentation>Decimal Number Value</xsd:documentation>
</xsd:annotation>
<xsd:restriction base="xsd:integer"/>
</xsd:simpleType>
Parsed in 0.000 seconds,  using GeSHi 1.0.8.4

See http://webapp.docx4java.org/OnlineDemo/ ... blW_2.html for @w:

The possible values for this attribute are defined by the ST_DecimalNumber simple type

http://webapp.docx4java.org/OnlineDemo/ ... umber.html

This simple type specifies that its contents will contain a whole decimal number (positive or negative)

Perhaps you could raise it with Google at https://productforums.google.com/forum/ ... ories/docs

Posted: **Mon Feb 24, 2014 8:50 pm**

Thanks, Jason

If I understand, you said that directly the 'docx4j API' can do nothing to solve it. The problem is the conversion done by google docs that does not respect the xsd, isn´t it ?

As you suggest I sent a topic in google forum, but it is difficult (or can take long time) obtain a reply due the quantity of posts everyday.

How can i turn this around in your opinion ?

Thanks
Sérgio

Posted: **Mon Feb 24, 2014 9:16 pm**

Does it later say "encountered unexpected content; pre-processing"?

It should .. If so, you could modify "org/docx4j/jaxb/mc-preprocessor.xslt" to replace the decimal with an integer.

If you are comfortable with XSLT, I'd be happy to accept an appropriate template rule as a contrib. If not, let me know...

Posted: **Mon Feb 24, 2014 10:13 pm**

Jason,

it does not say "encountered unexpected content; pre-processing".

The full stracktrace is

Code: Select all: ... INFO [org.docx4j.openpackaging.contenttype.ContentTypeManager:802] - Detected WordProcessingML package INFO [org.docx4j.openpackaging.io3.Load3:161] - Instantiated package of type org.docx4j.openpackaging.packages.WordprocessingMLPackage INFO [org.docx4j.utils.XPathFactoryUtil:22] - xpath implementation: org.apache.xpath.jaxp.XPathFactoryImpl INFO [org.docx4j.openpackaging.io3.Load3:180] - package read; elapsed time: 4504 ms INFO [org.docx4j.openpackaging.parts.JaxbXmlPart:129] - Lazily unmarshalling /word/document.xml INFO [org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware:299] - For org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart, unmarshall via binder java.lang.NumberFormatException: For input string: "10206.0" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:458) at java.math.BigInteger.<init>(BigInteger.java:316) at java.math.BigInteger.<init>(BigInteger.java:451) at com.sun.xml.bind.DatatypeConverterImpl._parseInteger(DatatypeConverterImpl.java:88) at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$22.parse(RuntimeBuiltinLeafInfoImpl.java:733) at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$22.parse(RuntimeBuiltinLeafInfoImpl.java:736) at com.sun.xml.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.parse(TransducedAccessor.java:241) at com.sun.xml.bind.v2.runtime.unmarshaller.StructureLoader.startElement(StructureLoader.java:202) at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:449) at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:427) at com.sun.xml.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:71) at com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:137) at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:240) at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:277) at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:246) at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:277) at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:246) at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:277) at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:246) at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:277) at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:246) at com.sun.xml.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:123) at com.sun.xml.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:106) at com.sun.xml.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:99) at com.sun.xml.bind.v2.runtime.BinderImpl.associativeUnmarshal(BinderImpl.java:156) at com.sun.xml.bind.v2.runtime.BinderImpl.unmarshal(BinderImpl.java:127) at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:316) at org.docx4j.openpackaging.parts.JaxbXmlPart.getJaxbElement(JaxbXmlPart.java:130) at exp.siga.aaa.docx4j.testes.Main.main(Main.java:36)

So, I can conclude that modify the mc-preprocessor.xslt cannot solve my problem, it is not ?

Thanks

Posted: **Mon Feb 24, 2014 11:28 pm**

Hi Sérgio,

Here is what happens in org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal

Syntax: [ Download ] [ Hide ]

Using java Syntax Highlighting

try{

                                jaxbElement =(E) XmlUtils.unwrap(binder.unmarshal( doc ));
}catch(UnmarshalException ue){
// try the mc-preprocessor.xslt stuff
Parsed in 0.013 seconds,  using GeSHi 1.0.8.4

In this case the exception is NumberFormatException (as opposed to UnmarshalException) - interesting.

We can catch that and do the same sort of thing. It may be expedient to do it in mc-preprocessor.xslt; if we don't, then the exception handled there could still occur (we'd need to experiment to see what order they happen in).

Posted: **Tue Feb 25, 2014 2:11 am**

Thanks in advance Jason

I think that I must something like this:

- Create in my project a JaxbXmlPartXPathAware that overrides the jar and

Code: Select all: try { jaxbElement = (E) XmlUtils.unwrap(binder.unmarshal( doc )); // Unwrap, so we have eg CTEndnotes, not JAXBElement } catch (NumberFormatException numberFormatException) { DOMResult result = new DOMResult(); Templates mcPreprocessorXslt = JaxbValidationEventHandler.getMcPreprocessor(); XmlUtils.transform(doc, mcPreprocessorXslt, null, result); doc = (org.w3c.dom.Document) result.getNode(); try { jaxbElement = (E) XmlUtils.unwrap(binder.unmarshal(doc)); } catch (ClassCastException e) { Unmarshaller u = jc.createUnmarshaller(); jaxbElement = (E) XmlUtils.unwrap(u.unmarshal( doc )); } } catch (UnmarshalException ue) { ... // no changes }

Now, my difficult is understand the mc-preprocessor.xslt and adapt to my issue.
Am I the right way ?

My other way to around the problem is to get the content of 'document.xml' like string text (like XMLUtil.marshalToString() does, but without xsd validation). So, with regex pattern I could replace the values of w:w= to integer (trunc the decimal places). Is there any way to get the string content ?

Thanks

Posted: **Tue Feb 25, 2014 1:05 pm**

Please see now https://github.com/plutext/docx4j/commi ... bd41be7bc3

I'll upload a nightly incorporating this later today.

I guess the question is: on what other elements/attributes does Google Docs make the same error? Please add to this thread if you find any others...

sfmorais wrote:My other way to around the problem is to get the content of 'document.xml' like string text

So far we've managed to avoid any optional string manipulation step prior to unmarshalling; that may need to change in the future..

Posted: **Tue Feb 25, 2014 8:18 pm**

Please try http://www.docx4java.org/docx4j/docx4j- ... 140225.jar

Posted: **Tue Feb 25, 2014 9:39 pm**

Many many thanks jason in advance

In a best analysis in the 'document.xml', I found (for my simple docx - in the first post) other tags with w:w= attribute of other different nodes like: <w:pgSz, <w:right, <w:bottom, <w:left, <w:top, <w:gridCol. For other docx more complex can exists more (I tell you when I find).

But, some w:w= appear in integer like <w:pgSz. (I don´t know what is the criteria of the Google to put some w:w= in integer other with decimal places).

The best is prevent the two possible values for all w:w (integer and decimal) and round or trunc if decimal

Your last jar solve me the first occurence of w:w (<w:tblW w:w="10206.0") but now appear the same exception for a "100.0"

Attached is the document.xml

Many Thanks

Regards,
Sergio

Posted: **Wed Feb 26, 2014 11:20 am**

OK please see now https://github.com/plutext/docx4j/commi ... e00d5ca257

By the way, could you please add a link to your post in the Google Docs forum, to make it easier to monitor for any replies?

Posted: **Thu Feb 27, 2014 1:34 am**

Hi Jason

I did some tests and the problem is solved.
Really your solution in xstl is the best to control all w:w atributes for any node

The related link in Google Docs forum is
https://productforums.google.com/forum/#!msg/docs/fcFovKihMtw/Ii7_mf1bO9wJ

Many thanks for your solution and nice.

Regards,
Sérgio

Plutext

problem with document created by Google Docs

problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs

Re: problem with document created by Google Docs