Page 1 of 1

Reading Word-Document from .xml file

PostPosted: Sat Jan 28, 2012 5:09 am
by Matthias
Hi Jason,

I am trying to read a Word-Document from a .xml file. The .xml file was saved by Microsoft Word 2010, 'Save as Word XML-Document (*.xml)'.

Here is my code:
Code: Select all
String docx= "/report_templates/MyDoc.xml";
JAXBContext jc = Context.jcXmlPackage;
Unmarshaller u = jc.createUnmarshaller();
u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());
InputStream iStream = getClass().getResourceAsStream(docx);
org.docx4j.xmlPackage.Package wmlPackageEl = (org.docx4j.xmlPackage.Package)((JAXBElement)u.unmarshal(new javax.xml.transform.stream.StreamSource(iStream))).getValue();
org.docx4j.convert.in.FlatOpcXmlImporter xmlPackage = new org.docx4j.convert.in.FlatOpcXmlImporter( wmlPackageEl);
WordprocessingMLPackage wordMLPackage = (WordprocessingMLPackage)xmlPackage.get();


This is the error I am getting:
Code: Select all
org.docx4j.openpackaging.exceptions.Docx4JException: Failed to add parts from relationships
        at org.docx4j.convert.in.FlatOpcXmlImporter.addPartsFromRelationships(FlatOpcXmlImporter.java:258)
        at org.docx4j.convert.in.FlatOpcXmlImporter.get(FlatOpcXmlImporter.java:183)
        at com.bmw.eap.common.report.docx.DocxCreator.generateBoardAgenda(DocxCreator.java:136)
        at com.bmw.eap.web.commands.SearchBoardDateAdministrationCmd.exportAgendaWord(SearchBoardDateAdministrationCmd.java:537)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        Truncated. see log file for complete stacktrace
org.docx4j.openpackaging.exceptions.Docx4JException: Failed to getPart
        at org.docx4j.convert.in.FlatOpcXmlImporter.getRawPart(FlatOpcXmlImporter.java:604)
        at org.docx4j.convert.in.FlatOpcXmlImporter.getRawPart(FlatOpcXmlImporter.java:387)
        at org.docx4j.convert.in.FlatOpcXmlImporter.getPart(FlatOpcXmlImporter.java:326)
        at org.docx4j.convert.in.FlatOpcXmlImporter.addPartsFromRelationships(FlatOpcXmlImporter.java:256)
        at org.docx4j.convert.in.FlatOpcXmlImporter.get(FlatOpcXmlImporter.java:183)
        Truncated. see log file for complete stacktrace
java.lang.IllegalArgumentException: prefix dcterms is not bound to a namespace
        at com.sun.xml.bind.DatatypeConverterImpl._parseQName(DatatypeConverterImpl.java:324)
        at com.sun.xml.bind.v2.runtime.unmarshaller.XsiTypeLoader.parseXsiType(XsiTypeLoader.java:52)
        at com.sun.xml.bind.v2.runtime.unmarshaller.XsiTypeLoader.startElement(XsiTypeLoader.java:30)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:369)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:347)
        Truncated. see log file for complete stacktrace


In my xml this is the part where dcterms occurs:
Code: Select all
   <pkg:part pkg:name="/docProps/core.xml" pkg:contentType="application/vnd.openxmlformats-package.core-properties+xml" pkg:padding="256">
      <pkg:xmlData>
         <cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <dc:creator>Matthias</dc:creator>
            <cp:lastModifiedBy>Matthias</cp:lastModifiedBy>
            <cp:revision>2</cp:revision>
            <dcterms:created xsi:type="dcterms:W3CDTF">2012-01-27T15:44:00Z</dcterms:created>
            <dcterms:modified xsi:type="dcterms:W3CDTF">2012-01-27T15:44:00Z</dcterms:modified>
         </cp:coreProperties>
      </pkg:xmlData>
   </pkg:part>


I couldn't find anybody else having this problem in the forum... so I am wondering what I am doing wrong.

Thanks in advance,
Matthias

Re: Reading Word-Document from .xml file

PostPosted: Wed Feb 01, 2012 6:53 pm
by jason
I wonder which Java you are using?

For Sun Java 6, XmlUtils sets:

Code: Select all
         System.setProperty("javax.xml.parsers.SAXParserFactory",
         "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl");


http://www.docx4java.org/docx4j/docx4j- ... 120125.jar does this for Oracle Java 7 as well.

Let me know whether that helps?

Re: Reading Word-Document from .xml file

PostPosted: Sat Feb 04, 2012 6:01 am
by Matthias
I am using:
Code: Select all
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode)

Your new jar file gives me the same error:
Code: Select all
java.lang.IllegalArgumentException: prefix dcterms is not bound to a namespace
        at com.sun.xml.bind.DatatypeConverterImpl._parseQName(DatatypeConverterImpl.java:324)
        at com.sun.xml.bind.v2.runtime.unmarshaller.XsiTypeLoader.parseXsiType(XsiTypeLoader.java:52)
        at com.sun.xml.bind.v2.runtime.unmarshaller.XsiTypeLoader.startElement(XsiTypeLoader.java:30)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:369)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:347)
        at com.sun.xml.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:35)
        at com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:101)
        at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:224)
        at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:261)
        at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:230)
        at com.sun.xml.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:107)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:288)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:271)
        at org.docx4j.openpackaging.parts.JaxbXmlPart.unmarshal(JaxbXmlPart.java:252)
        at org.docx4j.convert.in.FlatOpcXmlImporter.getRawPart(FlatOpcXmlImporter.java:437)
        at org.docx4j.convert.in.FlatOpcXmlImporter.getRawPart(FlatOpcXmlImporter.java:387)
        at org.docx4j.convert.in.FlatOpcXmlImporter.getPart(FlatOpcXmlImporter.java:326)
        at org.docx4j.convert.in.FlatOpcXmlImporter.addPartsFromRelationships(FlatOpcXmlImporter.java:256)
        at org.docx4j.convert.in.FlatOpcXmlImporter.get(FlatOpcXmlImporter.java:183)


Any other ideas?

Thank you,
Matthias

Re: Reading Word-Document from .xml file

PostPosted: Sat Feb 04, 2012 11:34 am
by jason
You should have no problems with that version of Java.

I would say the first step is to confirm that docx4j can handle your document in a minimal environment (ie a new project in your IDE, in which you use docx4j to open the document).

I expect that will work fine.

Then the question becomes what is it that is screwing things up when you integrate into your code base? Which SAXParserFactory is being used at that point? What jars are present besides the docx4j ones?

Re: Reading Word-Document from .xml file

PostPosted: Thu Mar 08, 2012 5:20 am
by thetreed1
I too am having the same issue. The code is generating a doc using Apose for Words and then I'm modifing the XML with docx4j to add features that Asposes does not support. Everything works fine in eclipse or from the command line. When the same jar gets run from the web server I get the error talked about above. The output generated by Apose from the command line matches what gets generated from the web but docx4j has an issue parsing the web generated doc.

I have my jar wrap the dependecies in it (i.e. Aspose, Docx4j, etc) so that I know what its loading.

Any ideas?

Re: Reading Word-Document from .xml file

PostPosted: Fri Mar 09, 2012 4:58 pm
by jason
What application server are you using?

It is likely that it is using other jars (eg from a shared class loader), and that this is causing problems.

Re: Reading Word-Document from .xml file

PostPosted: Fri Apr 19, 2013 10:07 pm
by jason
I encountered this problem myself today.

It happened when I was using docx4j in Eclipse, with dependencies managed by pom.xml.

I commented out the FOP dependency, and added a project containing FOP SVN tip as of today.

That's when the problem occurred.

I've narrowed the cause of the error down to the jar files the FOP project was exporting (contributing to dependent projects). The problem is one of the following (which is an older version) - I haven't bothered to work out which one exactly:

- xercesImpl 2.7.1
- xalan 2.7.0 (versus 2.7.1)
- serializer 2.7.0 (versus 2.7.1)
- avalon (4.2.0 versus 4.3.1)
- commons-logging (1.0.4 versus 1.1.1) - unlikely to be the problem

So if you are running into this issue, check whether you have one or more of the above on your classpath. That's likely to be the problem. And please do report back here with any insights into which jar is the culprit!