Page 1 of 1

sending rich text or html to docx

PostPosted: Tue Nov 16, 2010 7:37 am
by cscott530
Is it possible to send (without manual parsing and conversion) rich text and/or HTML to docx and have it render properly?

In my application, we are using a rich text editor that is capable of spitting out the text in either of those formats. I am trying to create a proof of concept with docx4j and this would be a major coup for its adoption on the project, as we were unable to find a solution to the same problem in the past using other APIs.

Thanks!

Re: sending rich text or html to docx

PostPosted: Tue Nov 16, 2010 8:46 am
by jason
I'd be interested in helping you to have docx4j parse HTML automatically.

At present, it has some ability to do this. Many of the classes in org.docx4j.model.properties have a constructor which takes a org.w3c.dom.css.CSSValue, and I have used this to convert HTML edited by CKEditor back to WordML. However, the original HTML came out of docx4j, so it may not cope so well with arbitrary HTML.

I do have a class for converting HTML tables as well (which I will look at committing today).

Another approach you can use is to embed the HTML (or RTF) as a w:altChunk. But then you need Word to turn it into normal docx content for you (my docx4j extension only does this for an altChunk of type docx4j at the moment, but I'd like to have it do the same for HTML).

Re: sending rich text or html to docx

PostPosted: Wed Nov 17, 2010 2:37 am
by cscott530
thanks for the input, jason.

i tried the altChunk strategy and had excellent results. here's a snippet of the code I use to do it:

there was some trial and error as far as getting the proper content type and getting the part names right, but once I got that right, it's been working perfectly since.

for others' readability sake, altChunkCount is just a static int I use to avoid duplicate names, and the factory is just the standard ObjectFactory
Code: Select all
public static void insertHtml(MainDocumentPart main, String html) {
      try {
         html = html.trim();
         if (!html.startsWith("<html>")) {
            html = "<html>" + html;
         }
         if (!html.endsWith("</html>")) {
            html = html + "</html>";
         }
         AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/test" + altChunkCount++ + ".html"));
         afiPart.setContentType(new ContentType("text/html"));
         afiPart.setBinaryData(html.getBytes());
         Relationship altChunkRel = main.addTargetPart(afiPart);

         CTAltChunk chunk = factory.createCTAltChunk();
         chunk.setId(altChunkRel.getId());

         main.addObject(chunk);
      }
      catch (Exception e) {
         e.printStackTrace();
      }
   }