Page 1 of 1

Normalizing docx using VariablePrepare

PostPosted: Tue Jun 30, 2020 10:54 pm
by kyerie030
Hi,

I'm trying to normalize a document (attached) using the latest 8.2.0 docx4j core and docx4j-jaxb-referencedImpl 8.2.0
using the commands:

Code: Select all
WordprocessingMLPackage wmlPackage = Docx4jUtil.getPackage(docMergeRequest.getTemplateFilePath()); // simply a path to the file to be used
MainDocumentPart mainDocumentPart = wmlPackage.getMainDocumentPart();
Docx4jUtil.cleanupBookmarks(mainDocumentPart);
Docx4jUtil.cleanupComments(mainDocumentPart);
VariablePrepare.prepare(wmlPackage);


when calling VariablePrepare, I am hit with the following error:

Code: Select all
9:46:17,691 INFO  [stdout] (default task-109) 2020-06-30 19:46:17,690 INFO  [default task-109] org.docx4j.jaxb.Context: java.vendor=Oracle Corporation
19:46:17,693 INFO  [stdout] (default task-109) 2020-06-30 19:46:17,691 INFO  [default task-109] org.docx4j.jaxb.Context: java.version=1.8.0_222-4-redhat
19:46:17,695 INFO  [stdout] (default task-109) 2020-06-30 19:46:17,693 INFO  [default task-109] org.docx4j.jaxb.Context: java.vm.name=OpenJDK 64-Bit Server VM
19:46:21,212 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,211 INFO  [default task-109] org.docx4j.jaxb.Context: JAXB Reference Implementation is in use.
19:46:21,394 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,393 INFO  [default task-109] org.docx4j.XmlUtils: setProperty com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
19:46:21,395 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,394 INFO  [default task-109] org.docx4j.XmlUtils: actual: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
19:46:21,396 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,395 INFO  [default task-109] org.docx4j.XmlUtils: setProperty com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
19:46:21,396 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,396 INFO  [default task-109] org.docx4j.XmlUtils: actual: com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
19:46:21,426 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,424 INFO  [default task-109] org.docx4j.openpackaging.contenttype.ContentTypeManager: Detected WordProcessingML package
19:46:21,431 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,431 INFO  [default task-109] org.docx4j.openpackaging.contenttype.ContentTypeManager: Detected WordProcessingML package
19:46:21,432 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,431 INFO  [default task-109] org.docx4j.openpackaging.io3.Load3: Instantiated package of type org.docx4j.openpackaging.packages.WordprocessingMLPackage
19:46:21,437 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,436 INFO  [default task-109] org.docx4j.utils.XPathFactoryUtil: xpath implementation: __redirected.__XPathFactory
19:46:21,452 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,451 INFO  [default task-109] org.docx4j.openpackaging.contenttype.ContentTypeManager: Using DocPropsCustomPart ...
19:46:21,453 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,452 INFO  [default task-109] org.docx4j.openpackaging.io3.Load3: package read;  elapsed time: 3784 ms
19:46:21,703 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,702 INFO  [default task-109] org.docx4j.jaxb.NamespacePrefixMapperUtils: Using ri.NamespacePrefixMapper, which is suitable for the JAXB RI
19:46:21,789 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,788 INFO  [default task-109] org.docx4j.openpackaging.parts.DocPropsExtendedPart: unmarshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
19:46:21,798 INFO  [stdout] (default task-109) 2020-06-30 19:46:21,798 INFO  [default task-109] org.docx4j.openpackaging.parts.DocPropsCorePart: unmarshalling org.docx4j.openpackaging.parts.DocPropsCorePart
19:46:22,025 INFO  [stdout] (default task-109) 2020-06-30 19:46:22,024 INFO  [default task-109] org.docx4j.openpackaging.parts.DocPropsCustomPart: unmarshalling org.docx4j.openpackaging.parts.DocPropsCustomPart
19:46:22,027 INFO  [stdout] (default task-109) 2020-06-30 19:46:22,027 INFO  [default task-109] org.docx4j.openpackaging.parts.DocPropsCustomPart:
19:46:22,028 INFO  [stdout] (default task-109)
19:46:22,028 INFO  [stdout] (default task-109) org.docx4j.openpackaging.parts.DocPropsCustomPart unmarshalled
19:46:22,028 INFO  [stdout] (default task-109)
19:46:22,028 INFO  [stdout] (default task-109)
19:46:22,140 INFO  [stdout] (default task-109) 2020-06-30 19:46:22,140 INFO  [default task-109] org.docx4j.XmlUtils: Using org.docx4j.org.apache.xalan.transformer.TransformerImpl
19:46:22,362 INFO  [stdout] (default task-109) 2020-06-30 19:46:22,361 INFO  [default task-109] org.docx4j.openpackaging.contenttype.ContentTypeManager: Detected WordProcessingML package
19:46:22,362 INFO  [stdout] (default task-109) 2020-06-30 19:46:22,362 INFO  [default task-109] org.docx4j.convert.in.FlatOpcXmlImporter: Creating org.docx4j.openpackaging.packages.WordprocessingMLPackage
19:46:22,364 INFO  [stdout] (default task-109) 2020-06-30 19:46:22,364 ERROR [default task-109] org.docx4j.convert.in.FlatOpcXmlImporter: prefix dcterms is not bound to a namespace
19:46:22,366 INFO  [stdout] (default task-109) 2020-06-30 19:46:22,365 ERROR [default task-109] org.docx4j.convert.in.FlatOpcXmlImporter: <cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties"><dcterms:created xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dcterms:W3CDTF">2020-03-03T08:48:00Z</dcterms:created><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Trainee</dc:creator><dc:description xmlns:dc="http://purl.org/dc/elements/1.1/"/><cp:keywords/><cp:lastModifiedBy>Julie Ann Tesorero</cp:lastModifiedBy><dcterms:modified xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dcterms:W3CDTF">2020-06-18T09:55:00Z</dcterms:modified><cp:revision>36</cp:revision><dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/"/><dc:title xmlns:dc="http://purl.org/dc/elements/1.1/"/></cp:coreProperties>
19:46:22,367 ERROR [stderr] (default task-109) java.lang.IllegalArgumentException: prefix dcterms is not bound to a namespace
19:46:22,367 ERROR [stderr] (default task-109)  at com.sun.xml.bind.DatatypeConverterImpl._parseQName(DatatypeConverterImpl.java:370)
19:46:22,367 ERROR [stderr] (default task-109)  at com.sun.xml.bind.v2.runtime.unmarshaller.XsiTypeLoader.parseXsiType(XsiTypeLoader.java:96)
19:46:22,367 ERROR [stderr] (default task-109)  at com.sun.xml.bind.v2.runtime.unmarshaller.XsiTypeLoader.startElement(XsiTypeLoader.java:74)
19:46:22,367 ERROR [stderr] (default task-109)  at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:577)
19:46:22,368 ERROR [stderr] (default task-109)  at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:556)
19:46:22,368 ERROR [stderr] (default task-109)  at com.sun.xml.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:75)
19:46:22,368 ERROR [stderr] (default task-109)  at com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:168)
19:46:22,368 ERROR [stderr] (default task-109)  at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:244)
19:46:22,368 ERROR [stderr] (default task-109)  at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:281)
19:46:22,368 ERROR [stderr] (default task-109)  at com.sun.xml.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:250)
19:46:22,368 ERROR [stderr] (default task-109)  at com.sun.xml.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:127)
19:46:22,368 ERROR [stderr] (default task-109)  at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:369)
19:46:22,368 ERROR [stderr] (default task-109)  at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:352)
19:46:22,368 ERROR [stderr] (default task-109)  at org.docx4j.openpackaging.parts.JaxbXmlPart.unmarshal(JaxbXmlPart.java:1111)
19:46:22,369 ERROR [stderr] (default task-109)  at org.docx4j.convert.in.FlatOpcXmlImporter.getRawPart(FlatOpcXmlImporter.java:477)
19:46:22,369 ERROR [stderr] (default task-109)  at org.docx4j.convert.in.FlatOpcXmlImporter.getRawPart(FlatOpcXmlImporter.java:427)
19:46:22,369 ERROR [stderr] (default task-109)  at org.docx4j.convert.in.FlatOpcXmlImporter.getPart(FlatOpcXmlImporter.java:366)
19:46:22,369 ERROR [stderr] (default task-109)  at org.docx4j.convert.in.FlatOpcXmlImporter.addPartsFromRelationships(FlatOpcXmlImporter.java:296)
19:46:22,369 ERROR [stderr] (default task-109)  at org.docx4j.convert.in.FlatOpcXmlImporter.get(FlatOpcXmlImporter.java:222)
19:46:22,369 ERROR [stderr] (default task-109)  at org.docx4j.openpackaging.packages.WordprocessingMLPackage.transform(WordprocessingMLPackage.java:256)
19:46:22,369 ERROR [stderr] (default task-109)  at org.docx4j.openpackaging.packages.WordprocessingMLPackage.filter(WordprocessingMLPackage.java:295)
19:46:22,369 ERROR [stderr] (default task-109)  at org.docx4j.model.datastorage.migration.VariablePrepare.prepare(VariablePrepare.java:111)
19:46:22,369 ERROR [stderr] (default task-109)  at org.docx4j.model.datastorage.migration.VariablePrepare.prepare(VariablePrepare.java:78)
19:46:22,370 ERROR [stderr] (default task-109)  at sg.gov.ura.dax2.docmerge.core.impls.docx4j.Docx4jGenerator.processDocMergeTemplate(Docx4jGenerator.java:88)
19:46:22,370 ERROR [stderr] (default task-109)  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
19:46:22,370 ERROR [stderr] (default task-109)  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
19:46:22,370 ERROR [stderr] (default task-109)  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
19:46:22,370 ERROR [stderr] (default task-109)  at java.lang.reflect.Method.invoke(Method.java:498)
19:46:22,370 ERROR [stderr] (default task-109)  at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:338)
19:46:22,370 ERROR [stderr] (default task-109)  at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:197)
19:46:22,370 ERROR [stderr] (default task-109)  at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
19:46:22,370 ERROR [stderr] (default task-109)  at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:294)
19:46:22,371 ERROR [stderr] (default task-109)  at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)
19:46:22,371 ERROR [stderr] (default task-109)  at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
....


The code is running on jboss 7.1, I was previously using 6.1.2 but were encountering issues when noramlizing causes the file to be corrupt.

Hoping someone can help point me in the right direction on how resolve this issue, I've searched around and the nearest topic I could find is this:
https://www.docx4java.org/forums/docx-java-f6/reading-word-document-from-xml-file-t978.html?hilit=dcterms&sid=7ba6b7565dc357c0a49aa11d1fddfa11&sid=7ba6b7565dc357c0a49aa11d1fddfa11#p3235

But the JARs being pointed out I'm not using.

Re: Normalizing docx using VariablePrepare

PostPosted: Thu Jul 02, 2020 11:38 am
by jason

Re: Normalizing docx using VariablePrepare

PostPosted: Fri Jul 03, 2020 3:39 pm
by kyerie030
Hi Jason,

Because stackoverflow replying style is a bit hard to relay the codes tried, I'll also post it here.

Code below:
Code: Select all
     public Map<String, Object> processDocMergeTemplate(DocMergeRequest docMergeRequest) {
      Map<String, Object> resultMap = new HashMap<String, Object>();
      
      try {
         
         DocumentBuilder db = XmlUtils.getNewDocumentBuilder();
         org.w3c.dom.Document dom = db.parse(new File("D:\\test\\foo.xml"));
         System.out.println("dom.getClass().getName(): " + dom.getClass().getName());
         
         Unmarshaller u = Context.jcDocPropsCore.createUnmarshaller();
         JaxbValidationEventHandler eventHandler = new JaxbValidationEventHandler();
         eventHandler.setContinue(false);
         u.setEventHandler(eventHandler);

         Object o = XmlUtils.unwrap(u.unmarshal(dom.getDocumentElement()));

         System.out.println("o.getClass().getName(): " + o.getClass().getName());

        } catch(Exception e) {
         docMergeRequest.setErrorMessage(e.getMessage());
      }
      
      return resultMap;
   }


And result as follows:

Code: Select all
09:34:14,626 INFO  [stdout] (default task-110) 2020-07-03 09:34:14,625 INFO  [default task-110] org.docx4j.XmlUtils: setProperty com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
09:34:14,627 INFO  [stdout] (default task-110) 2020-07-03 09:34:14,626 INFO  [default task-110] org.docx4j.XmlUtils: actual: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
09:34:14,627 INFO  [stdout] (default task-110) 2020-07-03 09:34:14,627 INFO  [default task-110] org.docx4j.XmlUtils: setProperty com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
09:34:14,628 INFO  [stdout] (default task-110) 2020-07-03 09:34:14,628 INFO  [default task-110] org.docx4j.XmlUtils: actual: com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
09:34:14,660 INFO  [stdout] (default task-110) dom.getClass().getName(): com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl
09:34:14,663 INFO  [stdout] (default task-110) 2020-07-03 09:34:14,662 INFO  [default task-110] org.docx4j.jaxb.Context: java.vendor=Oracle Corporation
09:34:14,664 INFO  [stdout] (default task-110) 2020-07-03 09:34:14,663 INFO  [default task-110] org.docx4j.jaxb.Context: java.version=1.8.0_222-4-redhat
09:34:14,665 INFO  [stdout] (default task-110) 2020-07-03 09:34:14,664 INFO  [default task-110] org.docx4j.jaxb.Context: java.vm.name=OpenJDK 64-Bit Server VM
09:34:18,589 INFO  [stdout] (default task-110) 2020-07-03 09:34:18,588 INFO  [default task-110] org.docx4j.jaxb.Context: JAXB Reference Implementation is in use.
09:34:18,767 INFO  [stdout] (default task-110) o.getClass().getName(): org.docx4j.docProps.core.CoreProperties

Re: Normalizing docx using VariablePrepare

PostPosted: Fri Jul 03, 2020 5:39 pm
by kyerie030
I found out how to log all the things docx4j only. Attaching the file here for your reference.
Had to zip because greater than 32kb.