Page 1 of 1

Embedded images are not properly sized: looking for a fix

PostPosted: Fri Aug 09, 2013 9:56 pm
by Siilk
Hi. I am using latest xhtml importer (built from this source codes: https://github.com/plutext/docx4j-ImportXHTML) to import html documents and I keep having problems with embedded images. Each image in the source html document is imported at 1:1 size, regardless of the size, set for the corresponding <img> tag, either in the css or inline. Thus I was trying to implement the proper sizing, but I can't seem to find the way to access aggregated values for the size of the image. Can you give me an advice here? Maybe there is a way to access xhtmlimporter's data, generated for the corresponding image to get the values I need?

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Sat Aug 10, 2013 12:08 am
by jason
I just added https://github.com/plutext/docx4j-Impor ... 2fa1361791

That gives you access to the BlockBox inside addImage. The BlockBox has various things related to size, as well as getStyle().

You've still got Element e, in case you need to access img attributes directly.

Note the comment about ReplacedElementFactory; I haven't explored that.

If you do get image size working properly using dimensions from BlockBox, we'd welcome it as a contribution :-)

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Sun Aug 11, 2013 8:05 pm
by Siilk
Thanks, jason, that's a nice little commit. But I need a little bit more of your time. :) I more or less know what to do with an <img> with absolute size set for it, but what about the relative size? Say, I have something like this: <img source="blah" width="50%" height="30%">. What will be the best way to handle this?

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Mon Aug 12, 2013 4:03 pm
by jason
You'll see in our source code a Docx4JFSImage object. There is an FSImage interface, which Docx4JFSImage does not implement. If it did (like ITextFSImage), we'd have:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
public interface FSImage {
    public int getWidth();
    public int getHeight();
    public void scale(int width, int height);
}
 
Parsed in 0.013 seconds, using GeSHi 1.0.8.4


But I think/hope that's unnecessary; there's a good chance that all you need from Flying Saucer is the CSS values and/or image attributes, which we then convert to twips.

Note docx4j's UnitsOfMeasurement.pxToTwip and then twipToEMU

see BinaryPartAbstractImage's createImageInline methods for how an image is scaled

To answer your question, I guess if the CSS provides a % value, you could add a createImageInline method to BinaryPartAbstractImage (or subclass) which takes a % arg. It would use the ImageInfo object's ImageSize to calculate the relevant actual sizes. Doing it this way seems quite natural.

Note that docx4j has a property "docx4j.DPI" which is used in UnitsOfMeasurement, and which you can use (but note comments below re file format settings).

Hope this helps...

=========

Flying Saucer notes
~~~~~~~~~~~~~~~~~~~


Flying Saucer:- http://flyingsaucerproject.github.io/fl ... tml#xil_39

For intrinsic width/height calculations we assume a resolution of 96 DPI, but setting an explicit width/height makes it possible to use an arbitrary DPI.

Note that there is https://github.com/plutext/flyingsaucer ... eUtil.java
but I don't think it is used

But in https://github.com/plutext/flyingsaucer ... derer.java at line 62 we have:

private static final float DEFAULT_DOTS_PER_POINT = 20f;
private static final int DEFAULT_DOTS_PER_PIXEL = 20;

These values are used at line 150:

_sharedContext.setDPI(72*_dotsPerPoint);
_sharedContext.setDotsPerPixel(dotsPerPixel);

to set the dPI and dpp attributes of the RenderingContext object

setting DPI also sets mm/dot:

this.mm_per_dot = (CM__PER__IN * MM__PER__CM) / dpi;

You can see these used in https://github.com/plutext/flyingsaucer ... Value.java at line 110 and following, but this is only used for fonts?


docx file format notes
~~~~~~~~~~~~~~~~~~~~~~


Per [MS-DOCX], 2.6.1.12 defaultImageDpi specifies the resolution in dots per inch (DPI) at which images in the document will be saved.

For what it is worth, you can get this value from CTSettings getDefaultImageDpi()

This setting is ignored by images that have dots per inch (DPI) specified by useLocalDpi (as specified in [MS-ODRAWXML] section 2.3.1.13).

For example: <a14:useLocalDpi xmlns:a14=\"http://schemas.microsoft.com/office/drawing/2010/main\" val=\"0\"/>
.. but docx4j doesn't support that yet.

This setting is also ignored when doNotAutoCompressPictures (as specified in [ISO/IEC29500-1:2011] section 17.15.1.33) is set to "true", which is CTSettings getDoNotAutoCompressPictures()

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Mon Aug 12, 2013 4:47 pm
by Siilk
That's a lot of useful info. Thanks!

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Tue Aug 13, 2013 1:06 pm
by jason
By the way, I noticed https://github.com/plutext/docx4j-Impor ... andler.css contains:

img {
display: inline-block;
border-width: 1px 1px 1px 1px;
margin: 0px;
padding: 0px;
}

in case you are seeing these effects, and wondering where it comes from...

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Mon Aug 19, 2013 1:27 pm
by Siilk
Hi again. I modified XHTMLImporter#addImage to correctly process width and height of an <img> tag and I'd like to run the changes by you. Here they are:

At line 1501 of XHTMLImporter, Instead of

Code: Select all
Inline inline = imagePart.createImageInline(null, null, 0, 1, false);


there is now

Code: Select all
            Inline inline;

            Long cx = (box.getStyle().valueByName(CSSName.WIDTH) == IdentValue.AUTO) ? null :
                        UnitsOfMeasurement.twipToEMU(box.getWidth());
                Long cy = (box.getStyle().valueByName(CSSName.HEIGHT) == IdentValue.AUTO) ? null :
                        UnitsOfMeasurement.twipToEMU(box.getHeight());
            if (cx == null && cy == null) {
               inline = imagePart.createImageInline(null, e.getAttribute("alt"), 0, 1, false);
            }
            else {
               if (cx == null){
                  cx = imagePart.getImageInfo().getSize().getWidthPx() *
                                (cy / imagePart.getImageInfo().getSize().getHeightPx());
               }
               else if (cy == null){
                  cy = imagePart.getImageInfo().getSize().getHeightPx() *
                                (cx / imagePart.getImageInfo().getSize().getWidthPx());
               }
               inline = imagePart.createImageInline(null, e.getAttribute("alt"), 0, 1, cx, cy, false);
            }


As you can see, the changes are more or less compact, while providing proper image resizing in accordance to provided height and width.

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Mon Aug 19, 2013 6:03 pm
by Siilk
Oh, just to make things clear: I'll make a proper pull request if would like the code I suggested. :)

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Sat Aug 24, 2013 7:22 pm
by Siilk
I actually made a pull request today, as the code is now stable and works as correctly as it could be given the nature of current flyingsaucer's settings. My commit also contains a unit test and a sample xhtml, that I used to test the image importing.

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Mon Aug 26, 2013 9:39 am
by jason
Hi there, thanks very much for your patch; https://github.com/plutext/docx4j-ImportXHTML/pull/4 which I have applied.

I'd like to follow up on the comments you made there, while it is still fresh:-

Siilk wrote:Initially, the problem was in determining the proper place to take image modified size from, as it wasn't obvious how flying saucer keeps them.

So we tried to recalculate the dimensions manually, adjusting the actual formula depending on the type of the value sued in the original html. but that lead to the overcomplicated code that would've been hard to maintain.

Thus we had to rely on flyingsaucer's mechanics and UnitsOfMeasurement#twipToEMUAdditionally to convert the dimensions.

Unfortunately that was complicated by saucer not keeping any exact data for the image dimension that hasn't been resized in the html, and the fact that saucer has it's DocxRenderer initialized with DEFAULT_DOTS_PER_POINT = DEFAULT_DOTS_PER_PIXEL = 20. The latter is currently causing all the pt-based size units(inches, cms etc) being distorted.


So Flying Saucer is using these values to calculate CSSName.HEIGHT and CSSName.WIDTH?

Is it using DEFAULT_DOTS_PER_POINT, DEFAULT_DOTS_PER_PIXEL, or both?

Note that there is a constructor:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
public DocxRenderer(float dotsPerPoint, int dotsPerPixel)
Parsed in 0.013 seconds, using GeSHi 1.0.8.4


Does XHTMLImporter need to be using that?

Siilk wrote:On top of that, image resizing function CxCy#scale that is used in BinaryPartAbstractImage#createImageInline to scale image that has no set dimensions is using artificially set dpi value for images with no internally set dps.

This value is calculated depending on the current screen resolution which also contributes to the image scale distortion.


Are you talking about org.apache.xmlgraphics.image.loader.ImageInfo here?

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Mon Aug 26, 2013 9:04 pm
by Siilk
jason wrote:
Siilk wrote:Initially, the problem was in determining the proper place to take image modified size from, as it wasn't obvious how flying saucer keeps them.

So we tried to recalculate the dimensions manually, adjusting the actual formula depending on the type of the value sued in the original html. but that lead to the overcomplicated code that would've been hard to maintain.

Thus we had to rely on flyingsaucer's mechanics and UnitsOfMeasurement#twipToEMUAdditionally to convert the dimensions.

Unfortunately that was complicated by saucer not keeping any exact data for the image dimension that hasn't been resized in the html, and the fact that saucer has it's DocxRenderer initialized with DEFAULT_DOTS_PER_POINT = DEFAULT_DOTS_PER_PIXEL = 20. The latter is currently causing all the pt-based size units(inches, cms etc) being distorted.


So Flying Saucer is using these values to calculate CSSName.HEIGHT and CSSName.WIDTH?


No, the box.getWidth() and box.getHeight(). CSSName.HEIGHT and CSSName.WIDTH store unmodified original values of width and height, derived for html tag.

jason wrote:Is it using DEFAULT_DOTS_PER_POINT, DEFAULT_DOTS_PER_PIXEL, or both?

Note that there is a constructor:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
public DocxRenderer(float dotsPerPoint, int dotsPerPixel)
Parsed in 0.014 seconds, using GeSHi 1.0.8.4


Does XHTMLImporter need to be using that?


The current call is importer.renderer = new DocxRenderer(), meaning importer's DocxRenderer is created with default values, which are set inside it like that:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
//    private static final float DEFAULT_DOTS_PER_POINT = 20f * 4f / 3f;
//    private static final int DEFAULT_DOTS_PER_PIXEL = 20;

private static final float DEFAULT_DOTS_PER_POINT = 20f;
private static final int DEFAULT_DOTS_PER_PIXEL = 20;
 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4

Note the original values, commented above the current ones.

As for if the XHTMLImporter should use explicit DocxRenderer(dotsPerPoint, dotsPerPixel), it's a tough question as there are probably a lot of code already depending on equal value for dpp and dppx, both inside the importer itself and in all those projects that are using it.

jason wrote:
Siilk wrote:On top of that, image resizing function CxCy#scale that is used in BinaryPartAbstractImage#createImageInline to scale image that has no set dimensions is using artificially set dpi value for images with no internally set dps.

This value is calculated depending on the current screen resolution which also contributes to the image scale distortion.


Are you talking about org.apache.xmlgraphics.image.loader.ImageInfo here?


Yes. For example, for a GIF image, imageInfo is created with org.apache.xmlgraphics.image.loader.impl.PreloaderGIF#determineSize.

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Tue Aug 27, 2013 9:25 am
by jason
Siilk wrote:As for if the XHTMLImporter should use explicit DocxRenderer(dotsPerPoint, dotsPerPixel), it's a tough question as there are probably a lot of code already depending on equal value for dpp and dppx, both inside the importer itself and in all those projects that are using it.


Hi, a couple of questions:

first, did you investigate how Flying Saucer actually uses DEFAULT_DOTS_PER_POINT and DEFAULT_DOTS_PER_PIXEL? I had a quick look (see earlier in this thread), but probably not as thoroughly as you :-)

second, ignoring for the moment code which may already be depending on equal value for dpp and dppx, what do you think would be the correct thing for docx4j to be doing? we can analyse the impact of any change based on q1 above; if a fix seems warranted, the 3.0 release is a good time to be making the change.

Siilk wrote:box.getWidth() and box.getHeight(). CSSName.HEIGHT and CSSName.WIDTH store unmodified original values of width and height, derived for html tag.


Please correct me if I'm wrong, but since your contribution uses these "unmodified original values", we're currently not reliant on the values of DEFAULT_DOTS_PER_POINT and DEFAULT_DOTS_PER_PIXEL in Flying Saucer?

kind regards .. Jason

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Wed Aug 28, 2013 10:43 pm
by Siilk
jason wrote:Hi, a couple of questions:

first, did you investigate how Flying Saucer actually uses DEFAULT_DOTS_PER_POINT and DEFAULT_DOTS_PER_PIXEL? I had a quick look (see earlier in this thread), but probably not as thoroughly as you :-)


In short, it is used to initialize SharedContext's dpi as well as mm_per_dot values. The latter in turn is used internally by flyingsaucer's code, for example in LengthValue.calcFloatProportionalValue().

jason wrote:second, ignoring for the moment code which may already be depending on equal value for dpp and dppx, what do you think would be the correct thing for docx4j to be doing? we can analyse the impact of any change based on q1 above; if a fix seems warranted, the 3.0 release is a good time to be making the change.


In my opinion the best way to fix the dpp/dppx situation is to initialize DocxRenderer with proper values manually. As for dpi values for images calculated, as far as I understand, XHTMLImporter uses it's own custom build of flying saucer, so I guess rewriting that code to use Docx4jProperties.getProperty("docx4j.DPI", "96") to have a consistent project wide dpi value would be a good idea.

jason wrote:Please correct me if I'm wrong, but since your contribution uses these "unmodified original values", we're currently not reliant on the values of DEFAULT_DOTS_PER_POINT and DEFAULT_DOTS_PER_PIXEL in Flying Saucer?


No, implemented resizing is using box.getWidth() and box.getHeight() as otherwise we'd had to recalculate all the values manually, while taking all the styles into account. Thus, having proper values for box.getWidth() and box.getHeight() is crucial. BTW, initialy I tried to implement the manual conversion from CSSName.HEIGHT and CSSName.WIDTH but the result was unelegant and inflexible so my team lead rejected this approach in favour of making use of flyingsaucer's box dimensions.

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Thu Sep 12, 2013 10:05 am
by jason
https://github.com/plutext/flyingsaucer ... 779541080c sets DEFAULT_DOTS_PER_POINT = 20f * 4f / 3f
(reverting back to original FS value), and adds a constructor to allow the user to alter this.

This will be used by docx4j 3.0, to be released soon.

I'd be grateful if you could make any changes to your contribution which are necessitated by this. Please update your docx4j-XHTMLImporter to the latest from GitHub first, since there have been some recent changes.

Siilk wrote: I guess rewriting that code to use Docx4jProperties.getProperty("docx4j.DPI", "96") to have a consistent project wide dpi value would be a good idea.


That does sound sensible.

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Fri Sep 13, 2013 9:50 pm
by Siilk
No changes are necessary as my code was specifically designed to be DPP-independent.

Re: Embedded images are not properly sized: looking for a fi

PostPosted: Fri Sep 13, 2013 11:26 pm
by jason
ok great. thanks...