Hello. I am trying to do a round trip with docx to xhtml and back. As a sample file I am using
https://github.com/plutext/docx4all/blo ... reSet.docx
The code I'm using to generate the xhtml is, more or less, this:
https://github.com/plutext/docx4j/blob/ ... tHtml.java
Is there a ConvertInHtml example which does the exact opposite of the previous file? I am trying to see whether we can use docx4j for the purpose stated above. I am aware of the docx4j importer project, but I have not found any useful code examples there, because all of them are years old (while the convertouthtml is quite new).
https://github.com/plutext/docx4j-Impor ... LFile.java
Using something similar to the above, I have manged some sort of partial reverse function. I had to use urls (file:///) to make images work, and I had to use a template blank document to generate some headers and footers.
Unfortunately there seem to be bugs with:
1. numbering (bullets work fine)
2. Spacing. For some reason, the generated document has quite some extra spacings before and after paragraphs. I'll need to see whether I can set it to zero.
3. Tables are recreated too big after exporting to xhtml. They don't fit on screen.
4. Because we use a template for headers and footers, I'll need to hardcode remove headers and footers in the export (headers and footers don't work in the xhtml import, right?)
5. Deletions in docx are recreated as red text with a strikethrough line ...
6. Insertions don't work.
7. Fonts such as Calibri (Body) don't seem to work, on the docx to XHTML conversion. They get converted to standard Calibri.
8. Image scaling is wrong.
It's a bit odd there are so many bugs I noticed immediately when using these APIS that seem rather mature.