Page 1 of 1

HtmlExporter - Generating relative paths in <img src ..>

PostPosted: Fri Jun 19, 2009 10:16 pm
by Leigh
Hi,

I am using the org.docx4j.convert.out.html.HtmlExporter sample to export a document to html. It works well, but I noticed the output uses the file protocol for the image source. Does anyone know if there is a way to generate a relative path instead?

Current output
Code: Select all
<img src="file:///C:/somedirectory/myfilename.html_files/image1.png" ....>


Preferred output
Code: Select all
<img src="myfilename.html_files/image1.png" ....>


-Leigh

Re: HtmlExporter - Generating relative paths in <img src ..>

PostPosted: Sat Jun 20, 2009 6:16 am
by jason
You'll need a minor modification to the source code to fix that.

See http://dev.plutext.org/trac/docx4j/brow ... cture.java

in current SVN HEAD at around line 227, and fixImgSrcURL method, at line 301.

Re: HtmlExporter - Generating relative paths in <img src ..>

PostPosted: Sun Jun 21, 2009 6:06 pm
by Leigh
Hi Jason,

Thanks. That was much simpler than I thought. I made some small changes to fixImgSrcURL. Now it works exactly the way I want. If anyone spots any potential problems, let me know.

Cheers
Leigh

Code: Select all
    static String fixImgSrcURL( FileObject fo)
    {
       String srcPath = null;
       String itemUrl = null;
       
      try {
         itemUrl = fo.getURL().toExternalForm().toLowerCase();
         log.info("itemURL ="+ itemUrl);

         if ( itemUrl.startsWith("http://") || itemUrl.startsWith("https://") )
           {
              srcPath = itemUrl;
           }
         // convert file protocol to relative reference
           else if ( itemUrl.toLowerCase().startsWith("file://")  )
           {
              srcPath = "";
              if (fo.getParent() != null)
              {
                 srcPath = fo.getParent().getName().getBaseName() +"/";
              }
              srcPath = srcPath + fo.getName().getBaseName();
           }
           else if ( itemUrl.startsWith("webdav://")  )
           {
              // TODO - convert to http:, dropping username / password
              srcPath = itemUrl;
           }
           else {
              log.warn("How to handle scheme: " + itemUrl );
           }   
      } catch (FileSystemException e) {
         e.printStackTrace();
      }
       
       return srcPath;
    }

Re: HtmlExporter - Generating relative paths in <img src ..>

PostPosted: Mon Jun 22, 2009 2:25 pm
by jason
Thanks for that Leigh; I've committed a change based on your code:

http://dev.plutext.org/trac/docx4j/changeset/843

cheers

Jason

Re: HtmlExporter - Generating relative paths in <img src ..>

PostPosted: Tue Jun 23, 2009 1:45 am
by Leigh
Hi Jason,

I spoke too soon. The relative paths work for the html export but not the pdf exports, which seem to require the full file path.

From what I can tell, all of the pdf exporters use the system temp directory for storage. So I thought a simple fix would be to check the image path folder. If it is the temp directory, then assume a pdf export is running and return the full path. Otherwise, assume it is an html export and return a relative path.

I had some other ideas, but that seemed the simplest and works with the both html exporters and all three pdf exporters. Here is a rough copy of what I had in mind. Can you think of any pitfalls with this approach?

Code: Select all
   static String fixImgSrcURL( FileObject fo)
    {
      String itemUrl = null;
      try {
         itemUrl = fo.getURL().toExternalForm().toLowerCase();
         log.info("itemURL ="+ itemUrl);

         if ( itemUrl.startsWith("http://") || itemUrl.startsWith("https://") )
         {
            return itemUrl;
         }
         // handle file protocol references
         else if ( itemUrl.toLowerCase().startsWith("file://")  )
         {
            // If the image is being stored in the system temp directory, assume
             // this is a pdf export, viaHTML and use absolute file paths
            FileObject tmpDir = getFileSystemManager().resolveFile(System.getProperty("java.io.tmpdir"));            
            String tmpURL = tmpDir.getURL().toExternalForm();

            String parentUrl = fo.getParent().getURL().toExternalForm();
             if (parentUrl.equalsIgnoreCase(tmpURL)) {
                   return itemUrl;
             }
             // Otherwise, assume it is an html export and return a relative path
             else {
                 return   fo.getParent().getName().getBaseName() +"/"
                          + fo.getName().getBaseName();
             }
           }
           else if ( itemUrl.startsWith("webdav://")  )
           {
              // TODO - convert to http:, dropping username / password
              return itemUrl;
           }
         log.warn("How to handle scheme: " + itemUrl );         
      } catch (FileSystemException e) {
         e.printStackTrace();
      }

      return itemUrl;
    }


-Leigh

Re: HtmlExporter - Generating relative paths in <img src ..>

PostPosted: Tue Jun 23, 2009 3:38 pm
by jason
Hi Leigh

The pdf exporters only use the temp dir because the following parameter is passed into the XSLT:

Code: Select all
           String imageDirPath = System.getProperty("java.io.tmpdir");
      settings.put("imageDirPath", imageDirPath);


At present, to save your HTML and associated images, you typically do something like:

Code: Select all
         OutputStream os = new java.io.FileOutputStream(inputfilepath + ".html");
         javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(os);
         exporter.html(wordMLPackage, result, inputfilepath + "_files");


so there is nothing to stop the user from choosing to save their files to the tmpdir (which we don't want to relativise of course).

Subject to addressing this, the convention seems pretty sound. (One case to consider is where a user wants to generate both a PDF and an HTML, and for performance reasons, doesn't want to save the image twice, but I think we can defer this until someone speaks up)

We could change the signature so the user doesn't get to choose the directory for their HTML images (ie it is always inputfilepath + "_files"), or make the HTML exporters throw an exception if the user tries to use the tmpdir? Maybe we throw the exception if they try to use tmpdir, but also offer a convenience signature where they don't have to specify an image dir at all.

thanks

Jason

Re: HtmlExporter - Generating relative paths in <img src ..>

PostPosted: Tue Jun 23, 2009 9:13 pm
by Leigh
Hi Jason,

... or make the HTML exporters throw an exception if the user tries to use the tmpdir? Maybe we throw the exception if they try to use tmpdir, but also offer a convenience signature where they don't have to specify an image dir at all.


Yes, I like that approach better than not allowing users to choose the output directory. I think a less restrictive approach provides greater flexibility in terms of usage, while maintaining simplicity with a convenience signature.

(One case to consider is where a user wants to generate both a PDF and an HTML, and for performance reasons, doesn't want to save the image twice, but I think we can defer this until someone speaks up)


Yes, I was thinking that would be problematic. Unless maybe the imageDirPath was split into two properties: a relative and absolute path. I was curious, so I ran some tests using that approach. On the one hand I liked the fact that you could access whichever path was needed in a given context (relative or absolute). So the code felt "cleaner". But I thought it lacked the elegance of the temp directory approach.

-Leigh

Re: HtmlExporter - Generating relative paths in <img src ..>

PostPosted: Wed Jun 24, 2009 2:37 am
by jason
Hi Leigh

I've committed http://dev.plutext.org/trac/docx4j/changeset/847

which has relative paths for HTML, and PDF working again.

The bit about

throw the exception if they try to use tmpdir [for HTML], but also offer a convenience signature where they don't have to specify an image dir at all


I will leave for later.

cheers

Jason