Page 1 of 1

Export to PDF

PostPosted: Wed Apr 07, 2010 10:31 pm
by stuart.ledwich
I have tried export to PDF and it appear to work well for the most part but there are a couple of thing that I would like some help with.

Currently Field codes are not supported I would like to look at contributing some code to support Field codes in the export to PDF process. Currently when this happens it print in bold red letters UNSUPPORTED. Can you give me some advice on where to start?

Also images in PDF, I exported a docx which contained 2 images but the resulting PDF did not have the image in it, is this a bug or not supported also.

Thank you for your help in advance.
Regards,
Stuart Ledwich

Re: Export to PDF

PostPosted: Wed Apr 07, 2010 10:50 pm
by jason
stuart.ledwich wrote:Currently Field codes are not supported I would like to look at contributing some code to support Field codes in the export to PDF process. Currently when this happens it print in bold red letters UNSUPPORTED. Can you give me some advice on where to start?


You need a template which matches the field in http://dev.plutext.org/svn/docx4j/trunk ... cx2fo.xslt
(The red letters UNSUPPORTED are the default template)

You'll probably want to call a Java extension function to do the actual processing. See http://dev.plutext.org/svn/docx4j/trunk ... rsion.java about half way down for examples of these. An effective pattern is to feed the DOM node into the extension function, where it can be converted to a JAXB object and processed via docx4j. The extension can then return a DOM node, or a string as appropriate (probably a string for field resolution).

stuart.ledwich wrote:Also images in PDF, I exported a docx which contained 2 images but the resulting PDF did not have the image in it, is this a bug or not supported also.


Images are generally supported. What type of image is it? You could attach a docx with just the image in it to this thread, or email it to me (jason@plutext.org) and I'll take a look.

Re: Export to PDF

PostPosted: Thu Apr 08, 2010 1:37 am
by stuart.ledwich
thanks for coming back to me so quickly. I have included a simple docx to try. I have exported it and the text appears but no image. Thank again.

Re: Export to PDF

PostPosted: Thu Apr 08, 2010 2:42 am
by stuart.ledwich
Just to flesh out my previous post.

I have also included the output PDF document.

Here is the code I used to convert the document:

Code: Select all
      String inputfilepath="/home/sledwich/temp/Nice blue hills";
      Date now = new Date();
      WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File(inputfilepath+".docx"));
      // Fonts identity mapping – best on Microsoft Windows
      wordMLPackage.setFontMapper(new IdentityPlusMapper());
      // Set up converter
      
      org.docx4j.convert.out.pdf.PdfConversion c
      = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);
      
      // Write to output stream
      OutputStream os = new java.io.FileOutputStream("/tmp/test.pdf");
      c.output(os);

Re: Export to PDF

PostPosted: Thu Apr 08, 2010 9:52 am
by jason
Thanks for this report. Now fixed in SVN. http://dev.plutext.org/trac/docx4j/changeset/1118
That was a regression introduced post v2.3.0 when i refactored the image handling extensions.

Re: Export to PDF

PostPosted: Thu Apr 08, 2010 11:10 am
by stuart.ledwich
Thank you - that fixed the problem.

I going to start looking at the field codes tomorrow, Thanks again.

Re: Export to PDF

PostPosted: Thu Apr 08, 2010 12:30 pm
by jason
Great, would be good to have field support in the exporters. (If you can make it work for pdf, i'll easily be able to add the same functionality to html)

Re: Export to PDF

PostPosted: Fri Apr 23, 2010 7:12 pm
by stuart.ledwich
I have made a very small change to avoid certain fields in the docx, this is largely due to Fields put their content both into the document body and therefore the field itself has very little function in the pdf output, so this patch simply hides them.

it is contributed on the basis of the document at http://dev.plutext.org/docx4j/docx4j_In ... butor.docx

Another problem with Images
I also seem to have hit another problem with images again, the difference seems to be the tag the images are stored in, previously you corrected an image for me on this thread - but this one appears in a w:pict tag which does not appear to work. I have included an example image that is showing the problem.

Hope you can help. Thank you very much for all your assistance so far.

Re: Export to PDF

PostPosted: Sat Apr 24, 2010 1:23 am
by jason
Hello Stuart

Thanks for reporting the problem with E10 images.

http://dev.plutext.org/trac/docx4j/changeset/1122 fixes this. The key line is:

Code: Select all

String imgRelId = converter.imageData.getOtherAttributes().get( 
                               new QName("http://schemas.openxmlformats.org/officeDocument/2006/relationships", "id"));    
                       //NB r:id is not given by getId()!


cheers .. Jason

Re: Export to PDF

PostPosted: Thu May 13, 2010 11:41 pm
by stuart.ledwich
Hi Jason,

Just trying to look at converting this doc which contains a picture in the header do you know if this is something that we can convert to pdf?

Re: Export to PDF

PostPosted: Sat May 15, 2010 1:20 am
by jason
Hi Stuart

In principle, it should work.

Where docx2fo.xslt says something like
Code: Select all
<xsl:apply-templates select="java:org.docx4j.model.structure.HeaderFooterPolicy.getFirstHeader($wmlPackage)"/>


it is fetching the XML for the header, and applying the templates to it. This should include image related templates.

If its not working, to find out where things are going wrong, set log4j logging for Conversion to DEBUG, or set its setSaveFO method before running the conversion.

That way you'll be able to see the intermediate XSL FO file, where you can look to see what it has produced in the header.

cheers .. Jason

Re: Export to PDF

PostPosted: Tue May 18, 2010 1:31 am
by stuart.ledwich
Jason,

Thanks for your reply. I tried it but have found that I seem to get an exception everytime I try to convert the pdf. Its hitting the following exception.

Code: Select all
8070 [main] ERROR docx4j.XmlUtils  - java.lang.ClassCastException: org.docx4j.openpackaging.parts.WordprocessingML.StyleDefinitionsPart cannot be cast to org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage
; Line#: 505; Column#: 18
javax.xml.transform.TransformerException: java.lang.ClassCastException: org.docx4j.openpackaging.parts.WordprocessingML.StyleDefinitionsPart cannot be cast to org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage


Full Exception output has been added as an attachment. thanks
exception.txt.zip
Full exception output
(1.01 KiB) Downloaded 374 times

Re: Export to PDF

PostPosted: Thu May 20, 2010 12:33 am
by jason
Hi Stuart

The problem is that org.docx4j.model.images.WordXmlPictureE10.handleImageRel is assuming the image is a rel of the main document:

BinaryPartAbstractImage part = (BinaryPartAbstractImage)wmlPackage.getMainDocumentPart()
.getRelationshipsPart().getPart(rel);

whereas in this case it is a rel of the header.

We need a way to pass in the source part. This means passing a parameter through the templates, or better, defining a modelState to keep track of which part the XSLT is currently processing.

I've got a fair bit on my plate at the moment, so I'm not sure how soon I'll get to this, even though it is probably only an hour's work. I'll see if I can look at it before next week.

.. Jason

Re: Export to PDF

PostPosted: Mon May 24, 2010 1:46 am
by jason