Page 1 of 1

XHTML img to docx conversion

PostPosted: Fri Feb 20, 2015 3:25 am
by VeXaL
I'm using XHTMLImporterImpl with WordprocessingMLPackage to convert html to docx and I'm having a problem with html img tags with defined height or width values.
Currently i'm using docx4j v3.2.1 and docx4j-ImportXHTML v3.2.2
Ex:
Code: Select all
<img src="img_sample1.png" height="250" />


On these cases the images after being exported as docx extremely small. Removing the height will render them properly.
I've traced the problem to the unit conversion. Specifically to the method addImage() of XHTMLImporterImpl as it gets pixel units from box.getHeight() and convert them as if they where twip.

Code: Select all
private void addImage(BlockBox box) {
            
      Long cx = (box.getStyle().valueByName(CSSName.WIDTH) == IdentValue.AUTO) ? null :
         UnitsOfMeasurement.twipToEMU(box.getWidth());
      Long cy = (box.getStyle().valueByName(CSSName.HEIGHT) == IdentValue.AUTO) ? null :
            UnitsOfMeasurement.twipToEMU(box.getHeight());
      
      xHTMLImageHandler.addImage( renderer.getDocx4jUserAgent(), wordMLPackage,
            this.getCurrentParagraph(true), box.getElement(), cx, cy);
      
   }


I've setup a temporary fix by overriding addImage() from XHTMLImageHandlerDefault. It does the necessary the additional conversion and renders the images with the expected sizes in docx. This corrects the issue, but is not an elegant solution.

The proper solution should be introduced into the method above, and perhaps somewhat similar to:
Code: Select all
Long cx = (box.getStyle().valueByName(CSSName.WIDTH) == IdentValue.AUTO) ? null :
         UnitsOfMeasurement.twipToEMU(UnitsOfMeasurement.pxToTwip(box.getWidth()));
Long cy = (box.getStyle().valueByName(CSSName.HEIGHT) == IdentValue.AUTO) ? null :
         UnitsOfMeasurement.twipToEMU(UnitsOfMeasurement.pxToTwip(box.getHeight()));


Could you validate this problem and proposed solution?

Re: XHTML img to docx conversion

PostPosted: Sun Feb 22, 2015 1:16 pm
by jason
Please see/try https://github.com/plutext/docx4j-Impor ... dad0e489c8

Note there is still a potential discrepancy in DPI settings (docx4j versus xhtmlrenderer) which ought to be looked at.