Page 1 of 1

HTML with <li></li> to docx

PostPosted: Fri Apr 12, 2019 1:42 am
by schyzo
Hello,

I discovered docx4j, and I am trying to use it to convert docx files to html, and from html back to docx.
I use docx4j v. 6.1.2, and docx4j-ImportXHTML v.6.1.0.

After some testing, I did the first step, but now I am stuck in the second one.

Here is my code :
Code: Select all
File html = new File("C:\\Users\\guest\\Downloads\\tmp\\24-20-01-062.html");
         
WordprocessingMLPackage wordPackage = WordprocessingMLPackage.createPackage();
         
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
         
XHTMLImporterImpl xhtmlImporter = new XHTMLImporterImpl(wordPackage);
xhtmlImporter.setHyperlinkStyle("Hyperlink");
System.out.println("---------- CONVERSION -------------");
wordPackage.getMainDocumentPart().getContent().addAll(xhtmlImporter.convert(html, null));
System.out.println("---------- END CONVERSION -------------");
         
OutputStream out = new FileOutputStream("C:\\Users\\guest1477\\Downloads\\tmpOut\\24-20-01-062.docx");
wordPackage.save(out);


When I launch the conversion, I always have a NullPointerException in the file XHTMLImporterImpl.class:1117 "listHelper.peekListItemStateStack().init();"
I checked in debug mode the different steps, and I found out that the ListItemStateStack is filled when meeting a "<ul>" or "<ol>" block in the html file, but then later on, the code tries to access this list for "<li>" blocks, which results in an NPE, because the list is empty, so the peek method returns a null, and it cannot launch the init() method.

Am I missing something?

Thank you very much for your help.

Re: HTML with <li></li> to docx

PostPosted: Fri Apr 12, 2019 6:35 am
by jason
Please attach simple XHTML exhibiting your issue.

Re: HTML with <li></li> to docx

PostPosted: Fri Apr 12, 2019 6:10 pm
by schyzo
Hello,
Please find attached the html file, that was initially generated by docx4j from a docx file.
Thank you for your help.

Re: HTML with <li></li> to docx

PostPosted: Mon Apr 15, 2019 7:38 pm
by schyzo
Ah, I think I found something.
The problem is that de html generated directly puts the <li></li>, without the surrouding <ul></ul> (or <ol>).
So it seems that the problem comes from the first step actually...

Maybe I have to add something in my code? Here it is:
Code: Select all
try {
   InputStream docx = new FileInputStream("C:/Users/guest/Downloads/24-20-01-062.docx");
   try {
      WordprocessingMLPackage lWordPackage = Docx4J.load(docx);
      FieldUpdater updater = new FieldUpdater(lWordPackage);
      updater.update(true);
      
      HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
      htmlSettings.setImageDirPath("C:/Users/guest/Downloads/tmp");
      htmlSettings.setImageTargetUri("C:/Users/guest/Downloads/tmp");
      htmlSettings.setWmlPackage(lWordPackage);
      
      Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);
      
      OutputStream out = new FileOutputStream("C:/Users/guest/Downloads/tmp/24-20-01-062.html");
      Docx4J.toHTML(htmlSettings, out, Docx4J.FLAG_EXPORT_PREFER_XSL);
      out.close();
      
   } catch (Docx4JException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
   }
   
   docx.close();
} catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
}

Re: HTML with <li></li> to docx

PostPosted: Mon Apr 15, 2019 8:07 pm
by jason
Try
Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
                SdtWriter.registerTagHandler("HTML_ELEMENT", new SdtToListSdtTagHandler());
 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4


On the import side, see https://github.com/plutext/docx4j-ImportXHTML/issues/21

Re: HTML with <li></li> to docx

PostPosted: Mon Apr 15, 2019 8:44 pm
by schyzo
Ah perfect, thank you very much :).

Have a good day!