Page 1 of 1

Need help with images in documents when converting to HTML

PostPosted: Tue Nov 26, 2013 12:12 pm
by alucard81
Hi,

I am working on a project that involves converting old word doc forms into HTML. I have converted the doc files into docs through MS's own migration tool. The files are then converted with docx4j and further processed into Freemarker template files.

I have an issue with the images that were embedded in the word files. The code I wrote can extract the images into the folder I have specified but can't link the images inside the HTML files created.

I tried using the latest beta ver of Docx4J but that either produces NULL IMG tags or a small placeholder that says "NOT YET IMPLEMENTED".

Is this a unimplemented feature or feature that is only coming down the pipeline?

Any insights will be welcomed.

Re: Need help with images in documents when converting to HT

PostPosted: Tue Nov 26, 2013 12:25 pm
by jason
You might be encountering an issue with the beta which was fixed by https://github.com/plutext/docx4j/commi ... 136fcbd52f
but I suspect not.

docx4j 3.0 proper is currently being synched into Maven Central, so you could try that later today or tomorrow.

You are welcome to post a docx which exhibits the issues: null img tag, and whatever is not yet implemented, and I'll take a look.

Re: Need help with images in documents when converting to HT

PostPosted: Tue Nov 26, 2013 1:10 pm
by alucard81
Thanks for the quick reply!!

I am more worried that I was doing something wrong with the new facade methods.

Code: Select all
WordprocessingMLPackage docx = WordprocessingMLPackage.load( input);   
            AbstractHtmlExporter exporter = new HtmlExporterNG2();
            
            //Use file system, so there is somewhere to save images (if any)
            os = new java.io.FileOutputStream(inputfilepath + ".html");   
            
            HtmlSettings htmlSettings = new HtmlSettings();
            
            htmlSettings.setImageDirPath(inputfilepath + "_files");
            htmlSettings.setImageTargetUri(
                  "/PSMS_POC_Prototype/Converted/"+
                        inputfilepath.substring(inputfilepath.lastIndexOf("\\")+1) + "_files"
                  );
            htmlSettings.setUserBodyTop("<form action='' method='post'>");
            htmlSettings.setUserBodyTail("</form>");
            
            javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(os);
            exporter.html(docx, result, htmlSettings );


Another thing that was really bugging me. I had a file that is on the long side (abt 60 pages iirc) and that file would cause my Tomcat to hit GC overhead limits. I had overcame that with 2.8.1 before but now I get that problem again.

I probably had been doing something wrong before I updated my ver of Docx4J and just never knew

Re: Need help with images in documents when converting to HT

PostPosted: Tue Nov 26, 2013 3:32 pm
by jason
alucard81 wrote:that file would cause my Tomcat to hit GC overhead limits. I had overcame that with 2.8.1 before but now I get that problem again.


The first thing I do when I install Tomcat, is give it a decent amount of memory.

Code: Select all
JAVA_OPTS=-Xmx4096M -XX:PermSize=256m -XX:MaxPermSize=256m


What are your settings? Further discussion on this topic ought to be in the deployment/tomcat forum :-)

Re: Need help with images in documents when converting to HT

PostPosted: Thu Nov 28, 2013 3:14 pm
by alucard81
Everything works like a charm now

well except one docx not quite fully converting over. I would not expect everything to convert properly since this document was given to me by someone else as a sort of a sample...

anyway I am going attach the document below, only the first page is found in the resultant html