Page 1 of 1

xPath with Word 2010

PostPosted: Thu Sep 27, 2012 4:04 am
by adi
Hi all

our customers are going to introduce Office2010 and we want to support them with this very good library.
Jason, can you say when word 2010 will be full supported?

I have an issue by searching for Content Controls with xpath on Word 2010 documents.
On 2010 version it doesn't find the Content Control, while for 2007 is ok.
The code is the same for both versions

Code: Select all
                String xpathSdt = "//w:sdt/w:sdtPr/w:tag";
      //String xpathSdt = "//w:sdt";
      
      List<Object> listAllContentControls = null;
      try {
         // Content Controls in MainDocumentPart
         listAllContentControls = mainPart.getJAXBNodesViaXPath(xpathSdt, false);
      } catch (JAXBException e) {
         log.error("Exception: " + e.getMessage(), e);
         return null;
      }


I'm doing it with docx4j 2.8.0 version and java 1.6 on Rational Software Architect.

in Attach you have both versions of the document.
Could you give some help or clue please?


thanks
Adrian

Re: xPath with Word 2010

PostPosted: Thu Sep 27, 2012 6:23 pm
by jason
Hi Adrian

Apart from the issue you have raised here, what practical problems is the current approach to 2010 documents causing you?

In the class JaxbXmlPartXPathAware, we have:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
                                        try {                          
                                                jaxbElement =  (E) XmlUtils.unwrap(binder.unmarshal( doc ));
                                        } catch (ClassCastException cce) {
                                                /*
                                                 * Work around for issue with JAXB binder, in Java 1.6
                                                 * encountered with /src/test/resources/jaxb-binder-issue.docx
                                                 * See http://old.nabble.com/BinderImpl.associ ... 56585.html
                                                 * and  http://java.net/jira/browse/JAXB-874
                                                 *
                                                 * java.lang.ClassCastException: org.docx4j.wml.PPr cannot be cast to javax.xml.bind.JAXBElement
                                                        at com.sun.xml.internal.bind.v2.runtime.ElementBeanInfoImpl$IntercepterLoader.intercept(Unknown Source)
                                                        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.endElement(Unknown Source)
                                                        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.endElement(Unknown Source)
                                                        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.SAXConnector.endElement(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.scan(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.scan(Unknown Source)
                                                        at com.sun.xml.internal.bind.unmarshaller.DOMScanner.scan(Unknown Source)
                                                        at com.sun.xml.internal.bind.v2.runtime.BinderImpl.associativeUnmarshal(Unknown Source)
                                                        at com.sun.xml.internal.bind.v2.runtime.BinderImpl.unmarshal(Unknown Source)
                                                        at org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart.unmarshal(MainDocumentPart.java:321)
                                                 */

       
                                                log.warn("Binder not available for this docx");
                                                Unmarshaller u = jc.createUnmarshaller();
                                                jaxbElement = (E) XmlUtils.unwrap(u.unmarshal( doc ));         
                                               
                                        }
 
Parsed in 0.015 seconds, using GeSHi 1.0.8.4


I just tested with the current JAXB reference implementation 2.2.6, and the problem still occurs.

So perhaps you could vote for http://java.net/jira/browse/JAXB-874 and add a comment to that effect?

The workaround is to use something like:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        static class SdtPrFinder extends CallbackImpl {
               
                List<SdtPr> sdtPrList = new ArrayList<SdtPr>();  
                               
                @Override
                public List<Object> apply(Object o) {
                       
                        if (o instanceof org.docx4j.wml.SdtBlock
                                        || o instanceof org.docx4j.wml.SdtRun
                                        || o instanceof org.docx4j.wml.CTSdtRow
                                        || o instanceof org.docx4j.wml.CTSdtCell ) {
                               
                                SdtPr sdtPr = OpenDoPEHandler.getSdtPr(o);
                                if (sdtPr!=null) {
                                        sdtPrList.add(sdtPr);
                                }
                        }                      
                        return null;
                }
        }

                SdtPrFinder sdtPrFinder = new SdtPrFinder();
                new TraversalUtil(paragraphs, sdtPrFinder);

                for ( SdtPr sdtPr : sdtPrFinder.sdtPrList) { ...
 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4


In my code, I always use TraversalUtil. JAXB XPath is a nice idea, but there are too many bugs in the Sun/Oracle JAXB for it to be effective.

Re: xPath with Word 2010

PostPosted: Fri Sep 28, 2012 1:06 am
by adi
Hi Jason

thank you very much, it's works.
As for Word2010 i'm testing heavily with hundred (even thousand) of word 2010 documents and I'll give you soon feedback.

Adrian

Re: xPath with Word 2010

PostPosted: Tue Nov 06, 2012 7:36 am
by jason
For anyone finding this topic, if you want to continue to use XPath, see docx-java-f6/moxy-t1242.html which may be worth a try.