Page 1 of 1

Converting raw html to docx

PostPosted: Tue Jan 08, 2013 8:38 am
by mike v-c
My web pag takes raw html from an editor and is trying to use docx4j to convert this raw html to a docxs file. But the resulting docx file is stripped of most of it's formatting. For example all headings look the same even if they are tagged as h1,h2 or h3 they all look like an h3. Underlining is not working at all. I've tried the following:

Code: Select all
   WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
           
            Map<String,Style> map = StyleDefinitionsPart.getKnownStyles();
           
            Collection<Style> col = map.values();
           
            for(Style s : col)
            {
               wordMLPackage.getMainDocumentPart().getPropertyResolver().activateStyle(s);
            }
           
            wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert(xhtmlString.toString(), null, wordMLPackage));
           


what am I doing wrong?

Re: Converting raw html to docx

PostPosted: Tue Jan 08, 2013 10:14 am
by Frobl
I'm not an expert. I just started using docx4j yesterday. But after going through the same problems - I would recommend you to make sure your html is valid xhtml, and if that does not solve the problem, you might have to define a css style for h1, h2.