Page 1 of 1

XmlUtils.unmarshallFromTemplate method for construct docx

PostPosted: Tue Nov 25, 2008 2:53 pm
by fiorenzo
Hi,
i'm working to develop a converter from html to docx in j2ee application.
I like the method XmlUtils.unmarshallFromTemplate for construct my docx.
But i don't know the correct syntax for generate base elements.
For example, this code is incorrect:
Code: Select all
<w:p xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\">
<w:pPr>
  <w:numPr>
   <w:ilvl w:val="0" />
   <w:numId w:val="5" />
  </w:numPr>
  </w:pPr>
<w:r>
     <w:t>Level one item one</w:t>
</w:r>
</w:p>


Can someone help-me??

Thank's in advance

Fiorenzo

Re: XmlUtils.unmarshallFromTemplate method for construct docx

PostPosted: Wed Nov 26, 2008 5:02 am
by jason
Hi

At a glance, it looks to me like you ought to be able to use unmarshallFromTemplate or unmarshalString to create a org.docx4j.wml.P object from the w:p you have listed. The only thing which looks odd is that some of the quotation marks in your snippet are escaped, and others aren't. Fix that, then tell us what happens?

If your objective is to convert html to docx, you might have a look at the proof of concept org.docx4j.openpackaging.packages.html2wordml.xslt
(Are you already using XSLT to solve the problem, or some other approach?)

That transforms html (assumed to be just one or more <p>) into a w:sdtContent. The code below gets an org.docx4j.wml.SdtContentBlock object from it.

Code: Select all
             // Convert htmlstring to WordML via XSLT

                // Strip any &nbsp; since Xalan doesn't like these undeclared                
                if (htmlString.indexOf("&nbsp;")>-1) {                        
                   htmlString = htmlString.replace(html_nbsp, html_nbsp_replacement);               
                }
                             
             // .. so we need 
             javax.xml.transform.stream.StreamSource ss
                = new javax.xml.transform.stream.StreamSource(new java.io.StringReader(htmlString));
             
             java.io.InputStream xslt = org.docx4j.utils.ResourceUtils.getResource("org/docx4j/openpackaging/packages/html2wordml.xslt");
             
             javax.xml.transform.dom.DOMResult wmlResult = new javax.xml.transform.dom.DOMResult();
             
             // Debug
             //javax.xml.transform.stream.StreamResult wmlResult = new javax.xml.transform.stream.StreamResult(pw);
             
             org.docx4j.XmlUtils.transform(ss, xslt, null, wmlResult);
             
             // For convenience, that results in an sdtContent element
             
             JAXBContext jc = org.plutext.Context.jcTransforms;
             Unmarshaller u = jc.createUnmarshaller();
             u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());

             org.docx4j.wml.SdtContentBlock sdtContent = (org.docx4j.wml.SdtContentBlock)u.unmarshal(wmlResult.getNode());
             


You quite probably don't want to make an sdtContent object, in which case you'll need to change the transform a bit.

If you were trying to convert entire HTML documents (rather than a paragraph or 2), a more general approach might be appropriate. This would be for the XSLT to output the entire main document part. Alternatively, output in pkg format (since this includes all the parts in a single xml document - so you could potentially convert css into a styles part), and then to use org.docx4j.convert.in.XmlPackage to create a new WordML package.

Hope this helps,

Jason

Re: XmlUtils.unmarshallFromTemplate method for construct docx

PostPosted: Thu Dec 04, 2008 2:15 pm
by fiorenzo
Hi Jason,

i wrote a little parser from html to xml: i use htmlparser (http://htmlparser.sourceforge.net/) to traverse html product from tinymce (http://tinymce.moxiecode.com/) javascript wrapper for a textarea.
I use this solution because i know htmlparser library...
My connessaince of xslt is very limited...like to and my time...


My previous mystake is:
it's possibile attach to principal document, with unmarshall method, only one part for time (one single xml root element).

Fiorenzo