Page 1 of 1

How to convert the HTML sample to a pdf file ?

PostPosted: Tue Feb 26, 2013 8:53 pm
by salocinx
Hi there.

I tried out the AltChunkHtml.java sample code from the samples dir. It works fine as long as I don't try to convert the result to a pdf by using org.docx4j.convert.out.pdf.PdfConversion.

Here's my code:

Code: Select all
public class AltChunkHtml {

   public static void main(String[] args) throws Exception {
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
            String html = "<html><head><title>Import me</title></head><body><p>Hello World!</p></body></html>";
            AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/hw.html"));
            afiPart.setBinaryData(html.getBytes());
            afiPart.setContentType(new ContentType("text/html"));
            Relationship altChunkRel = wordMLPackage.getMainDocumentPart().addTargetPart(afiPart);
            // .. the bit in document body
            CTAltChunk ac = Context.getWmlObjectFactory().createCTAltChunk();
            ac.setId(altChunkRel.getId() );
            wordMLPackage.getMainDocumentPart().addObject(ac);
            // .. content type
            wordMLPackage.getContentTypeManager().addDefaultContentType("html", "text/html");
            // .. save as pdf
            convert(wordMLPackage, FileType.PDF);
   }
       
        public static File convert(WordprocessingMLPackage document, FileType type) {
            File file = null;
            OutputStream os = null;
            try {
                Mapper fontMapper = new IdentityPlusMapper();
                document.setFontMapper(fontMapper);
                File fo = new File("C:/report.fo");
                file = new File("C:/report.pdf");
                org.docx4j.convert.out.pdf.PdfConversion converter = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(document);
                ((org.docx4j.convert.out.pdf.viaXSLFO.Conversion)converter).setSaveFO(fo);
                os = new FileOutputStream(file);
                converter.output(os, new PdfSettings());
            } catch(FileNotFoundException ex) {
                Logger.getLogger(AltChunkHtml.class.getName()).log(Level.SEVERE, null, ex);
            } catch(Docx4JException ex) {
                Logger.getLogger(AltChunkHtml.class.getName()).log(Level.SEVERE, null, ex);
            } catch(Exception ex) {
                Logger.getLogger(AltChunkHtml.class.getName()).log(Level.SEVERE, null, ex);
            } finally {
                try {
                    if(os!=null) {
                        os.close();
                    }
                } catch (IOException ex) {
                    Logger.getLogger(TemplateManager.class.getName()).log(Level.SEVERE, null, ex);
                }
            }
            return file;
        }

}


and this one is the resulting error:

Code: Select all
org.docx4j.openpackaging.exceptions.Docx4JException: FOP issues
   at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:374)
   at ch.lawsuite.core.AltChunkHtml.convert(AltChunkHtml.java:55)
   at ch.lawsuite.core.AltChunkHtml.main(AltChunkHtml.java:41)
Caused by: javax.xml.transform.TransformerException: org.apache.fop.fo.ValidationException: "fo:flow" is missing child elements. Required content model: marker* (%block;)+ (Siehe Position 1:897)
   at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:502)
   at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:362)
   ... 2 more
Caused by: org.apache.fop.fo.ValidationException: "fo:flow" is missing child elements. Required content model: marker* (%block;)+ (Siehe Position 1:897)
   at org.apache.fop.events.ValidationExceptionFactory.createException(ValidationExceptionFactory.java:38)
   at org.apache.fop.events.EventExceptionManager.throwException(EventExceptionManager.java:54)
   at org.apache.fop.events.DefaultEventBroadcaster$1.invoke(DefaultEventBroadcaster.java:175)
   at $Proxy36.missingChildElement(Unknown Source)
   at org.apache.fop.fo.FONode.missingChildElementError(FONode.java:549)
   at org.apache.fop.fo.pagination.Flow.endOfNode(Flow.java:86)
   at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:349)
   at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:177)
   at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1102)
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2939)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:647)
   at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
   at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
   at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485)
   ... 3 more


What I do wrong here ?

Many thanks for your help in advance! Best regards, Nick.

Re: How to convert the HTML sample to a pdf file ?

PostPosted: Tue Feb 26, 2013 10:02 pm
by jason
Your code as it stands is embedding the HTML in the docx, without converting it to WordML - its relying on Word (or other consuming application to do that).

So the PDF conversion code is not seeing any content it can handle.

The sample AltChunkXHTMLRoundTrip shows how to get docx4j to convert the AltChunk (AlternativeFormatInputPart) to WordML. wordMLPackage.getMainDocumentPart().convertAltChunks()

Do that before you do your PDF conversion.

Or avoid the AltChunk stuff altogether, by simply converting XHTML directly. See the ConvertInXHTML* samples