Plutext

Posted: **Mon Aug 27, 2012 1:18 pm**

Hi,all! can docx4j convert doc to jpeg or other image formats? thanks!

Posted: **Mon Aug 27, 2012 1:39 pm**

Assuming you mean docx (not doc) .. you have 2 approaches.

First is FOP (which docx4j uses for PDF output). If you google "FOP output formats", you'll see formats include "TIFF/PNG". I haven't tried this myself, but I know someone else used this approach to get PCL output successfully.

Second is docx to HTML to image format. Try googling "html to image java", and see http://stackoverflow.com/questions/2651 ... erver-side
Please note http://pigeonholdings.com/projects/flyi ... tml#xil_29

A modified flyingsaucer/xhtmlrenderer is what docx4j uses for its XHTML import stuff, so all other things being equal, using this for the other way as well would be a good approach.

Posted: **Wed Aug 29, 2012 12:58 am**

thanks jason! i have tried the two ways you mentioned, and it nearly works! I have a little font problem when convert the docx to html or pdf. since my docx contains chinese font such as 'SimSun', the font finally becomes 'Calibri' no matter in the generated html or fo file, and the chinese character will be replaced by '#' in pdf or image. however, if i change the 'SimSun' to other non-chinese font like 'arial unicode ms', the final font will still be ''arial unicode ms', and the conversation will be ok.
besides, the fonts installed on my machine will be correctly identified by calling 'PhysicalFonts.getPhysicalFonts()', and the 'SimSun' font as well.

this is my code for pdf

Code: Select all: String inputfilepath = "test.docx"; WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath)); // the PdfConversion object org.docx4j.convert.out.pdf.PdfConversion c = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage); // for demo/debugging purposes, save the intermediate XSL FO ((org.docx4j.convert.out.pdf.viaXSLFO.Conversion) c).setSaveFO(new java.io.File(inputfilepath + ".fo")); // PdfConversion writes to an output stream String outputfilepath = System.getProperty("user.dir") + "/OUT_FontContent.pdf"; OutputStream os = new java.io.FileOutputStream(outputfilepath); // OK, do it... c.output(os, new PdfSettings());

when running the above code, some errors occur:
WARN org.docx4j.fonts.PhysicalFonts .addPhysicalFont line 214 - Aborting: file:/C:/Windows/FONTS/TrajanPro-Regular.otf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
INFO org.docx4j.fonts.IdentityPlusMapper .populateFontMappings line 115 - Skipped null key
WARN org.docx4j.fonts.IdentityPlusMapper .populateFontMappings line 131 - - - No physical font for: 宋体
INFO org.docx4j.fonts.microsoft.MicrosoftFontsRegistry .setupMicrosoftFontsRegistry line 50 - unmarshalling fonts.microsoft
ERROR org.docx4j.convert.out.pdf.viaXSLFO.Conversion .declareFonts line 141 - Document font 宋体 is not mapped to a physical font!

i want to know does this caused by docx4j's failing to detect the chinese font like '宋体'(simsun) in the docx? i guess the reason for '#' appears in the generated document is because the substituted font 'Calibri' can't support Chinese character.

Posted: **Wed Aug 29, 2012 9:28 pm**

https://github.com/plutext/docx4j/commi ... 3705634ba2 adds basic support.

With that, I can see the SimSun font in PDF output.

Note, the code won't detect a Chinese font which is used only in the styles. That remains a TODO.

Posted: **Thu Aug 30, 2012 4:54 pm**

jason wrote:https://github.com/plutext/docx4j/commit/08ec3f1ce7d9fa7b75fb8d40118d443705634ba2 adds basic support.

With that, I can see the SimSun font in PDF output.

Note, the code won't detect a Chinese font which is used only in the styles. That remains a TODO.

thanks jason! i am looking forward to the wonderful support for Chinese font, for now, i can set the 'font-family' attribute of 'fo:block' element in fo file with my desired font manually.

Posted: **Thu Jul 31, 2014 6:09 pm**

Hi buptstehc,

Are you able to convert docx file to jpg or other images successfully? Can you send me the code to do the same? thanks

Posted: **Thu Jul 31, 2014 6:12 pm**

Hi Jason,

I tried the first approach you mentioned to convert docx to other image formats but I am not able to find out how to change FOP output formats. Can u suggest something?

Posted: **Thu Jul 31, 2014 8:00 pm**

Have a look at https://github.com/plutext/docx4j/blob/ ... ocx4J.java
lines 465-490

Something like:

FOSettings settings = Docx4J.createFOSettings();
settings.setWmlPackage(wmlPackage);
settings.setApacheFopMime(mime type);
Docx4J.toFO(settings, outputStream, FLAG_NONE);

Posted: **Thu Jul 31, 2014 11:20 pm**

Hi Jason,

I tried implementing this solution. On setting ApacheFopMime = "images/tif", i am getting the below exception:

Code: Select all: SEVERE: Error while rendering page 2. Reason: java.lang.RuntimeException: Int or float buffers require 32-bit data. java.lang.RuntimeException: Int or float buffers require 32-bit data. at org.apache.xmlgraphics.image.codec.tiff.TIFFImageEncoder.encode(TIFFImageEncoder.java:250) at org.apache.xmlgraphics.image.codec.tiff.TIFFImageEncoder.encodeMultiple(TIFFImageEncoder.java:166) at org.apache.xmlgraphics.image.writer.internal.TIFFImageWriter$TIFFMultiImageWriter.writeImage(TIFFImageWriter.java:130) at org.apache.fop.render.bitmap.AbstractBitmapDocumentHandler.endPageContent(AbstractBitmapDocumentHandler.java:321) at org.apache.fop.render.intermediate.util.IFDocumentHandlerProxy.endPageContent(IFDocumentHandlerProxy.java:157) at org.apache.fop.render.intermediate.IFRenderer.renderPage(IFRenderer.java:599) at org.apache.fop.area.RenderPagesModel.renderPage(RenderPagesModel.java:193) at org.apache.fop.area.RenderPagesModel.checkPreparedPages(RenderPagesModel.java:174) at org.apache.fop.area.RenderPagesModel.addPage(RenderPagesModel.java:146) at org.apache.fop.layoutmgr.AbstractPageSequenceLayoutManager.finishPage(AbstractPageSequenceLayoutManager.java:312) at org.apache.fop.layoutmgr.PageSequenceLayoutManager.finishPage(PageSequenceLayoutManager.java:191) at org.apache.fop.layoutmgr.AbstractPageSequenceLayoutManager.makeNewPage(AbstractPageSequenceLayoutManager.java:283) at org.apache.fop.layoutmgr.PageSequenceLayoutManager.makeNewPage(PageSequenceLayoutManager.java:151) at org.apache.fop.layoutmgr.PageBreaker.handleBreakTrait(PageBreaker.java:545) at org.apache.fop.layoutmgr.PageBreaker.startPart(PageBreaker.java:444) at org.apache.fop.layoutmgr.AbstractBreaker.addAreas(AbstractBreaker.java:530) at org.apache.fop.layoutmgr.AbstractBreaker.addAreas(AbstractBreaker.java:481) at org.apache.fop.layoutmgr.PageBreaker.doPhase3(PageBreaker.java:313) at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:436) at org.apache.fop.layoutmgr.PageBreaker.doLayout(PageBreaker.java:90) at org.apache.fop.layoutmgr.PageSequenceLayoutManager.activateLayout(PageSequenceLayoutManager.java:113) at org.apache.fop.area.AreaTreeHandler.endPageSequence(AreaTreeHandler.java:267) at org.apache.fop.fo.pagination.PageSequence.endOfNode(PageSequence.java:128) at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:347) at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:181) at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1102) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485) at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:204) at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:153) at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:135) at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:45) at org.docx4j.convert.out.common.AbstractExporter.export(AbstractExporter.java:82) at org.docx4j.Docx4J.toFO(Docx4J.java:475) at Drivers.docx4jConverter.main(docx4jConverter.java:50) Jul 31, 2014 5:43:49 PM org.docx4j.convert.out.common.AbstractExporter export SEVERE: Exception exporting package java.lang.RuntimeException: Int or float buffers require 32-bit data. at org.apache.xmlgraphics.image.codec.tiff.TIFFImageEncoder.encode(TIFFImageEncoder.java:250) at org.apache.xmlgraphics.image.codec.tiff.TIFFImageEncoder.encodeMultiple(TIFFImageEncoder.java:166) at org.apache.xmlgraphics.image.writer.internal.TIFFImageWriter$TIFFMultiImageWriter.writeImage(TIFFImageWriter.java:130) at org.apache.fop.render.bitmap.AbstractBitmapDocumentHandler.endPageContent(AbstractBitmapDocumentHandler.java:321) at org.apache.fop.render.intermediate.util.IFDocumentHandlerProxy.endPageContent(IFDocumentHandlerProxy.java:157) at org.apache.fop.render.intermediate.IFRenderer.renderPage(IFRenderer.java:599) at org.apache.fop.area.RenderPagesModel.renderPage(RenderPagesModel.java:193) at org.apache.fop.area.RenderPagesModel.checkPreparedPages(RenderPagesModel.java:174) at org.apache.fop.area.RenderPagesModel.addPage(RenderPagesModel.java:146) at org.apache.fop.layoutmgr.AbstractPageSequenceLayoutManager.finishPage(AbstractPageSequenceLayoutManager.java:312) at org.apache.fop.layoutmgr.PageSequenceLayoutManager.finishPage(PageSequenceLayoutManager.java:191) at org.apache.fop.layoutmgr.AbstractPageSequenceLayoutManager.makeNewPage(AbstractPageSequenceLayoutManager.java:283) at org.apache.fop.layoutmgr.PageSequenceLayoutManager.makeNewPage(PageSequenceLayoutManager.java:151) at org.apache.fop.layoutmgr.PageBreaker.handleBreakTrait(PageBreaker.java:545) at org.apache.fop.layoutmgr.PageBreaker.startPart(PageBreaker.java:444) at org.apache.fop.layoutmgr.AbstractBreaker.addAreas(AbstractBreaker.java:530) at org.apache.fop.layoutmgr.AbstractBreaker.addAreas(AbstractBreaker.java:481) at org.apache.fop.layoutmgr.PageBreaker.doPhase3(PageBreaker.java:313) at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:436) at org.apache.fop.layoutmgr.PageBreaker.doLayout(PageBreaker.java:90) at org.apache.fop.layoutmgr.PageSequenceLayoutManager.activateLayout(PageSequenceLayoutManager.java:113) at org.apache.fop.area.AreaTreeHandler.endPageSequence(AreaTreeHandler.java:267) at org.apache.fop.fo.pagination.PageSequence.endOfNode(PageSequence.java:128) at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:347) at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:181) at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1102) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485) at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:204) at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:153) at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:135) at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:45) at org.docx4j.convert.out.common.AbstractExporter.export(AbstractExporter.java:82) at org.docx4j.Docx4J.toFO(Docx4J.java:475) at Drivers.docx4jConverter.main(docx4jConverter.java:50)

On setting the value of ApacheFopMime = "images/png", i am able to get a png image of the first page only. I want the images for all pages in the docx file.

This is my code:

Code: Select all: String inputfilepath = "D:\\Office Conversion Test\\sample.docx"; WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath)); String outputfilepath = "D:\\Office Conversion Test\\sample.tif"; OutputStream os = new java.io.FileOutputStream(outputfilepath); FOSettings settings = Docx4J.createFOSettings(); settings.setWmlPackage(wordMLPackage); settings.setApacheFopMime("image/tiff"); Docx4J.toFO(settings, os, Docx4J.FLAG_NONE);

Posted: **Fri Aug 01, 2014 7:43 am**

These are both FOP questions; if Google doesn't help, you'll have more luck getting an answer on the FOP user mailing list, or perhaps at StackOverflow.

For the first question, you should try to create a short FO file which replicates the issue. docx4j can save the intermediate FO for you:

settings.setFoDumpFile(new java.io.File(inputfilepath + ".fo"));

Alternatively, you could use docx4j to create a PDF, then try using PDFBox to convert that PDF to images:

http://pdfbox.apache.org/docs/1.8.3/jav ... riter.html

Plutext

convert doc to jpeg

convert doc to jpeg

Re: convert doc to jpeg

Re: convert doc to jpeg

Re: convert doc to jpeg

Re: convert doc to jpeg

Re: convert doc to jpeg

Re: convert doc to jpeg

Re: convert doc to jpeg

Re: convert doc to jpeg

Re: convert doc to jpeg