Page 1 of 1

HTML to docx

PostPosted: Fri Dec 19, 2014 7:17 pm
by M.M
Hello! Can anyone give me some sample code to convert html to docx. I have found some links over here but all of them point to github and i dont have access over there for some reasons. What i found until now that i think may be the right way is this row:

wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert( htmlText, null,wordMLPackage) );

if i understand right htmlText is a string which contains the html code. I added docx4j-ImportXHTML-3.2.1.jar and i am using docx4j-3.2.1. The class XHTMLImporter is not recognized. Obviously i am doing something wrong.. but what?


I have found this code
Code: Select all
String stringFromFile = FileUtils.readFileToString(new File(destFolder
                    + "\\" + xhtmlFileName), "UTF-8");
              WordprocessingMLPackage docxOut = WordprocessingMLPackage
                    .createPackage();

              NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
              docxOut.getMainDocumentPart().addTargetPart(ndp);
              ndp.unmarshalDefaultNumbering();
              XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(docxOut);
              XHTMLImporter.setHyperlinkStyle("Hyperlink");
              docxOut.getMainDocumentPart().getContent()
                    .addAll(XHTMLImporter.convert(stringFromFile, null));


              docxOut.save(new java.io.File(destFolder + "\\" + docxFileName));


which is probably doing what i want, but i get a java.lang.NoClassDefFoundError: org/docx4j/org/xhtmlrenderer/layout/Styleable error. I have added Xalan-2.7.0.jar also and still getting the same error. Any help/hint would be appreciated.

Re: HTML to docx

PostPosted: Fri Dec 19, 2014 7:51 pm
by jason
M.M wrote:The class XHTMLImporter is not recognized. Obviously i am doing something wrong.. but what?


Sounds like you don't actually have docx4j-ImportXHTML-3.2.1.jar on your classpath. Or perhaps you are missing a dependency?
(Hard to tell when you don't provide a stack trace!)

M.M wrote:Can anyone give me some sample code to convert html to docx.


Where is your XHTML (eg in a file, a string, or a URL)?

Re: HTML to docx

PostPosted: Fri Dec 19, 2014 7:55 pm
by M.M
The html it's in a file. I am using now the second sample of code i wrote in the question. And i still get the same error.

The stack trace of the error:
Code: Select all
Exception in thread "main" java.lang.NoClassDefFoundError: org/docx4j/org/xhtmlrenderer/layout/Styleable
   at test_Image_docx4j.AddingAnInlineImage.main(AddingAnInlineImage.java:72)
Caused by: java.lang.ClassNotFoundException: org.docx4j.org.xhtmlrenderer.layout.Styleable
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   ... 1 more


the line 72 is :
Code: Select all
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(docxOut);

Re: HTML to docx

PostPosted: Fri Dec 19, 2014 8:49 pm
by jason

Re: HTML to docx

PostPosted: Fri Dec 19, 2014 9:16 pm
by M.M
Thank you a lot jason. I had to add also the itext-2.1.7.jar file and it worked just fine.