Page 1 of 1

convert doc to jpeg

PostPosted: Mon Aug 27, 2012 1:18 pm
by buptstehc
Hi,all! can docx4j convert doc to jpeg or other image formats? thanks!

Re: convert doc to jpeg

PostPosted: Mon Aug 27, 2012 1:39 pm
by jason
Assuming you mean docx (not doc) .. you have 2 approaches.

First is FOP (which docx4j uses for PDF output). If you google "FOP output formats", you'll see formats include "TIFF/PNG". I haven't tried this myself, but I know someone else used this approach to get PCL output successfully.

Second is docx to HTML to image format. Try googling "html to image java", and see http://stackoverflow.com/questions/2651 ... erver-side
Please note http://pigeonholdings.com/projects/flyi ... tml#xil_29

A modified flyingsaucer/xhtmlrenderer is what docx4j uses for its XHTML import stuff, so all other things being equal, using this for the other way as well would be a good approach.

Re: convert doc to jpeg

PostPosted: Wed Aug 29, 2012 12:58 am
by buptstehc
thanks jason! i have tried the two ways you mentioned, and it nearly works! I have a little font problem when convert the docx to html or pdf. since my docx contains chinese font such as 'SimSun', the font finally becomes 'Calibri' no matter in the generated html or fo file, and the chinese character will be replaced by '#' in pdf or image. however, if i change the 'SimSun' to other non-chinese font like 'arial unicode ms', the final font will still be ''arial unicode ms', and the conversation will be ok.
besides, the fonts installed on my machine will be correctly identified by calling 'PhysicalFonts.getPhysicalFonts()', and the 'SimSun' font as well.

this is my code for pdf
Code: Select all
        String inputfilepath = "test.docx";
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));

        // the PdfConversion object
        org.docx4j.convert.out.pdf.PdfConversion c = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);

        // for demo/debugging purposes, save the intermediate XSL FO
        ((org.docx4j.convert.out.pdf.viaXSLFO.Conversion) c).setSaveFO(new java.io.File(inputfilepath + ".fo"));

        // PdfConversion writes to an output stream
        String outputfilepath = System.getProperty("user.dir") + "/OUT_FontContent.pdf";
        OutputStream os = new java.io.FileOutputStream(outputfilepath);

        // OK, do it...
        c.output(os, new PdfSettings());


when running the above code, some errors occur:
WARN org.docx4j.fonts.PhysicalFonts .addPhysicalFont line 214 - Aborting: file:/C:/Windows/FONTS/TrajanPro-Regular.otf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
INFO org.docx4j.fonts.IdentityPlusMapper .populateFontMappings line 115 - Skipped null key
WARN org.docx4j.fonts.IdentityPlusMapper .populateFontMappings line 131 - - - No physical font for: 宋体
INFO org.docx4j.fonts.microsoft.MicrosoftFontsRegistry .setupMicrosoftFontsRegistry line 50 - unmarshalling fonts.microsoft
ERROR org.docx4j.convert.out.pdf.viaXSLFO.Conversion .declareFonts line 141 - Document font 宋体 is not mapped to a physical font!


i want to know does this caused by docx4j's failing to detect the chinese font like '宋体'(simsun) in the docx? i guess the reason for '#' appears in the generated document is because the substituted font 'Calibri' can't support Chinese character.

Re: convert doc to jpeg

PostPosted: Wed Aug 29, 2012 9:28 pm
by jason
https://github.com/plutext/docx4j/commi ... 3705634ba2 adds basic support.

With that, I can see the SimSun font in PDF output.

Note, the code won't detect a Chinese font which is used only in the styles. That remains a TODO.

Re: convert doc to jpeg

PostPosted: Thu Aug 30, 2012 4:54 pm
by buptstehc
jason wrote:https://github.com/plutext/docx4j/commit/08ec3f1ce7d9fa7b75fb8d40118d443705634ba2 adds basic support.

With that, I can see the SimSun font in PDF output.

Note, the code won't detect a Chinese font which is used only in the styles. That remains a TODO.


thanks jason! i am looking forward to the wonderful support for Chinese font, for now, i can set the 'font-family' attribute of 'fo:block' element in fo file with my desired font manually.

Re: convert doc to jpeg

PostPosted: Thu Jul 31, 2014 6:09 pm
by Nikhil Barar
Hi buptstehc,

Are you able to convert docx file to jpg or other images successfully? Can you send me the code to do the same? thanks

Re: convert doc to jpeg

PostPosted: Thu Jul 31, 2014 6:12 pm
by Nikhil Barar
Hi Jason,

I tried the first approach you mentioned to convert docx to other image formats but I am not able to find out how to change FOP output formats. Can u suggest something?

Re: convert doc to jpeg

PostPosted: Thu Jul 31, 2014 8:00 pm
by jason
Have a look at https://github.com/plutext/docx4j/blob/ ... ocx4J.java
lines 465-490

Something like:

FOSettings settings = Docx4J.createFOSettings();
settings.setWmlPackage(wmlPackage);
settings.setApacheFopMime(mime type);
Docx4J.toFO(settings, outputStream, FLAG_NONE);

Re: convert doc to jpeg

PostPosted: Thu Jul 31, 2014 11:20 pm
by Nikhil Barar
Hi Jason,

I tried implementing this solution. On setting ApacheFopMime = "images/tif", i am getting the below exception:

Code: Select all
SEVERE: Error while rendering page 2. Reason: java.lang.RuntimeException: Int or float buffers require 32-bit data.
java.lang.RuntimeException: Int or float buffers require 32-bit data.
   at org.apache.xmlgraphics.image.codec.tiff.TIFFImageEncoder.encode(TIFFImageEncoder.java:250)
   at org.apache.xmlgraphics.image.codec.tiff.TIFFImageEncoder.encodeMultiple(TIFFImageEncoder.java:166)
   at org.apache.xmlgraphics.image.writer.internal.TIFFImageWriter$TIFFMultiImageWriter.writeImage(TIFFImageWriter.java:130)
   at org.apache.fop.render.bitmap.AbstractBitmapDocumentHandler.endPageContent(AbstractBitmapDocumentHandler.java:321)
   at org.apache.fop.render.intermediate.util.IFDocumentHandlerProxy.endPageContent(IFDocumentHandlerProxy.java:157)
   at org.apache.fop.render.intermediate.IFRenderer.renderPage(IFRenderer.java:599)
   at org.apache.fop.area.RenderPagesModel.renderPage(RenderPagesModel.java:193)
   at org.apache.fop.area.RenderPagesModel.checkPreparedPages(RenderPagesModel.java:174)
   at org.apache.fop.area.RenderPagesModel.addPage(RenderPagesModel.java:146)
   at org.apache.fop.layoutmgr.AbstractPageSequenceLayoutManager.finishPage(AbstractPageSequenceLayoutManager.java:312)
   at org.apache.fop.layoutmgr.PageSequenceLayoutManager.finishPage(PageSequenceLayoutManager.java:191)
   at org.apache.fop.layoutmgr.AbstractPageSequenceLayoutManager.makeNewPage(AbstractPageSequenceLayoutManager.java:283)
   at org.apache.fop.layoutmgr.PageSequenceLayoutManager.makeNewPage(PageSequenceLayoutManager.java:151)
   at org.apache.fop.layoutmgr.PageBreaker.handleBreakTrait(PageBreaker.java:545)
   at org.apache.fop.layoutmgr.PageBreaker.startPart(PageBreaker.java:444)
   at org.apache.fop.layoutmgr.AbstractBreaker.addAreas(AbstractBreaker.java:530)
   at org.apache.fop.layoutmgr.AbstractBreaker.addAreas(AbstractBreaker.java:481)
   at org.apache.fop.layoutmgr.PageBreaker.doPhase3(PageBreaker.java:313)
   at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:436)
   at org.apache.fop.layoutmgr.PageBreaker.doLayout(PageBreaker.java:90)
   at org.apache.fop.layoutmgr.PageSequenceLayoutManager.activateLayout(PageSequenceLayoutManager.java:113)
   at org.apache.fop.area.AreaTreeHandler.endPageSequence(AreaTreeHandler.java:267)
   at org.apache.fop.fo.pagination.PageSequence.endOfNode(PageSequence.java:128)
   at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:347)
   at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:181)
   at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1102)
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
   at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
   at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485)
   at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:204)
   at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:153)
   at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:135)
   at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:45)
   at org.docx4j.convert.out.common.AbstractExporter.export(AbstractExporter.java:82)
   at org.docx4j.Docx4J.toFO(Docx4J.java:475)
   at Drivers.docx4jConverter.main(docx4jConverter.java:50)

Jul 31, 2014 5:43:49 PM org.docx4j.convert.out.common.AbstractExporter export
SEVERE: Exception exporting package
java.lang.RuntimeException: Int or float buffers require 32-bit data.
   at org.apache.xmlgraphics.image.codec.tiff.TIFFImageEncoder.encode(TIFFImageEncoder.java:250)
   at org.apache.xmlgraphics.image.codec.tiff.TIFFImageEncoder.encodeMultiple(TIFFImageEncoder.java:166)
   at org.apache.xmlgraphics.image.writer.internal.TIFFImageWriter$TIFFMultiImageWriter.writeImage(TIFFImageWriter.java:130)
   at org.apache.fop.render.bitmap.AbstractBitmapDocumentHandler.endPageContent(AbstractBitmapDocumentHandler.java:321)
   at org.apache.fop.render.intermediate.util.IFDocumentHandlerProxy.endPageContent(IFDocumentHandlerProxy.java:157)
   at org.apache.fop.render.intermediate.IFRenderer.renderPage(IFRenderer.java:599)
   at org.apache.fop.area.RenderPagesModel.renderPage(RenderPagesModel.java:193)
   at org.apache.fop.area.RenderPagesModel.checkPreparedPages(RenderPagesModel.java:174)
   at org.apache.fop.area.RenderPagesModel.addPage(RenderPagesModel.java:146)
   at org.apache.fop.layoutmgr.AbstractPageSequenceLayoutManager.finishPage(AbstractPageSequenceLayoutManager.java:312)
   at org.apache.fop.layoutmgr.PageSequenceLayoutManager.finishPage(PageSequenceLayoutManager.java:191)
   at org.apache.fop.layoutmgr.AbstractPageSequenceLayoutManager.makeNewPage(AbstractPageSequenceLayoutManager.java:283)
   at org.apache.fop.layoutmgr.PageSequenceLayoutManager.makeNewPage(PageSequenceLayoutManager.java:151)
   at org.apache.fop.layoutmgr.PageBreaker.handleBreakTrait(PageBreaker.java:545)
   at org.apache.fop.layoutmgr.PageBreaker.startPart(PageBreaker.java:444)
   at org.apache.fop.layoutmgr.AbstractBreaker.addAreas(AbstractBreaker.java:530)
   at org.apache.fop.layoutmgr.AbstractBreaker.addAreas(AbstractBreaker.java:481)
   at org.apache.fop.layoutmgr.PageBreaker.doPhase3(PageBreaker.java:313)
   at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:436)
   at org.apache.fop.layoutmgr.PageBreaker.doLayout(PageBreaker.java:90)
   at org.apache.fop.layoutmgr.PageSequenceLayoutManager.activateLayout(PageSequenceLayoutManager.java:113)
   at org.apache.fop.area.AreaTreeHandler.endPageSequence(AreaTreeHandler.java:267)
   at org.apache.fop.fo.pagination.PageSequence.endOfNode(PageSequence.java:128)
   at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:347)
   at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:181)
   at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1102)
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
   at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
   at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485)
   at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:204)
   at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:153)
   at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:135)
   at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:45)
   at org.docx4j.convert.out.common.AbstractExporter.export(AbstractExporter.java:82)
   at org.docx4j.Docx4J.toFO(Docx4J.java:475)
   at Drivers.docx4jConverter.main(docx4jConverter.java:50)


On setting the value of ApacheFopMime = "images/png", i am able to get a png image of the first page only. I want the images for all pages in the docx file.

This is my code:
Code: Select all
String inputfilepath = "D:\\Office Conversion Test\\sample.docx";
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));

        String outputfilepath = "D:\\Office Conversion Test\\sample.tif";
        OutputStream os = new java.io.FileOutputStream(outputfilepath);
       
        FOSettings settings = Docx4J.createFOSettings();
        settings.setWmlPackage(wordMLPackage);
        settings.setApacheFopMime("image/tiff");
        Docx4J.toFO(settings, os, Docx4J.FLAG_NONE);

Re: convert doc to jpeg

PostPosted: Fri Aug 01, 2014 7:43 am
by jason
These are both FOP questions; if Google doesn't help, you'll have more luck getting an answer on the FOP user mailing list, or perhaps at StackOverflow.

For the first question, you should try to create a short FO file which replicates the issue. docx4j can save the intermediate FO for you:

settings.setFoDumpFile(new java.io.File(inputfilepath + ".fo"));

Alternatively, you could use docx4j to create a PDF, then try using PDFBox to convert that PDF to images:

http://pdfbox.apache.org/docs/1.8.3/jav ... riter.html