Page 1 of 1

Docx to HTML with header images

PostPosted: Tue Jan 24, 2012 2:41 am
by fabiog
Hi,
sorry for my poor english.
I try to convert a docx file in a html file, but images present in the header section not appear in html. I see html code generated and there is a tag like
Code: Select all
<img height="20" id="rId1" width="20" />
for header but there is no link to a picture.
Java code for export to html is:
Code: Select all
HtmlSettings hs = new HtmlSettings();
hs.setImageDirPath(pathDirOutput + dirImgsPrefName + fileNoExt + "/");
hs.setImageTargetUri(dirImgsPrefName + fileNoExt + "/");

WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File(fileNameWithPath));
HtmlExporterNG2 exporter = new HtmlExporterNG2();

fo = new FileOutputStream(pathDirOutput + fileNameHtml);
javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(fo);
exporter.html(wordMLPackage, result, hs);

Please help.
Thanks.
Any help/solution is highly appreciated.

Re: Docx to HTML with header images

PostPosted: Tue Jan 24, 2012 1:03 pm
by jason
In the XSLT docx2xhtmlNG2.xslt, images are handled by the following templates:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
  <xsl:template match="wp:inline|wp:anchor">
 
         <xsl:variable name="wpinline" select="."/>
       
        <xsl:choose>
                <!--  sanity check -->
                <xsl:when test="./a:graphic/a:graphicData/pic:pic">
               
        <xsl:copy-of select="java:org.docx4j.model.images.WordXmlPictureE20.createHtmlImgE20(
                        $wmlPackage,
                        $imageHandler,
                        $wpinline)" />

                </xsl:when>
                <xsl:otherwise>
               
                        <xsl:copy-of
                                select="java:org.docx4j.convert.out.html.HtmlExporterNG2.notImplemented(., ' without pic:pic ' )" />                   
                </xsl:otherwise>       
        </xsl:choose>
   
  </xsl:template>
 
    <!--  E1.0 images  -->
        <xsl:template match="w:pict">
       
                <xsl:choose>
                        <xsl:when test="./v:shape/v:imagedata">
       
                                <xsl:variable name="wpict" select="."/>
                               
                                <xsl:copy-of select="java:org.docx4j.model.images.WordXmlPictureE10.createHtmlImgE10(
                                                $wmlPackage,
                                                $imageHandler,
                                                $wpict)" />

                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:comment>TODO: handle w:pict containing other than ./v:shape/v:imagedata</xsl:comment>
                        <xsl:copy-of
                                select="java:org.docx4j.convert.out.html.HtmlExporterNG2.notImplemented(., ' without v:imagedata ' )" />                       
                        </xsl:otherwise>
                </xsl:choose>                  
       
        </xsl:template>

 
Parsed in 0.002 seconds, using GeSHi 1.0.8.4


As you can see, these templates call Java extension functions.

The problem is that the Java extension functions assume they have been called from the main document part (that the images are rels from there).

To fix this, the templates would need to be made aware of which part was invoking them (by name, say), and this value would need to be passed through to the extension function.

Re: Docx to HTML with header images

PostPosted: Tue Jan 24, 2012 10:54 pm
by fabiog
But is it possible to fix this problem or big changes are needed to docx2xhtmlNG2.xslt?

Re: Docx to HTML with header images

PostPosted: Wed Jan 25, 2012 10:52 am
by jason
Sure it is possible to fix; I've sketched out what needs to be done. In terms of size of the change, it would probably take me around an hour to do, but as things stand doing this isn't something I have as a high priority, since headers in HTML output isn't that important to most users. Trust this makes sense.

Re: Docx to HTML with header images

PostPosted: Wed Jun 20, 2012 7:47 pm
by skyvic
Hi Jason,

Are you thinking solve this issue in short future ?

That would be very useful for me.. :)

Thanks,
skyvic.

Re: Docx to HTML with header images

PostPosted: Thu Jun 21, 2012 10:48 pm
by jason
I've created https://github.com/plutext/docx4j/issues/10 and scheduled this to be done before 2.8.1.

Re: Docx to HTML with header images

PostPosted: Fri Apr 26, 2013 1:30 am
by Joerakel
Hi Jason,

right now I am working with docx4j to convert several docx files to xhtml and it works fine so far, but images in headers would be a really cool feature for me too because most of my documents got atleast one in it. Will this be included in 3.0? (it wasn't in 2.8.1 right? at least it does not work for me).

Re: Docx to HTML with header images

PostPosted: Wed Jul 17, 2013 4:18 pm
by jason
Issue 10 is fixed by https://github.com/plutext/docx4j/commi ... a9aecfac54
Will be in nightlies after today, and 3.0