Page 1 of 1

error when try to convert docx which has tables to pdf

PostPosted: Tue Aug 24, 2010 10:43 am
by bzha005
Hi Jason,
When I try to convert your /sample-docs/databinding/invoice.docx to pdf using docx4j 2.5.0, I got the following error, any ideas?

Exception in thread "main" org.docx4j.openpackaging.exceptions.Docx4JException: FOP issues
at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:341)
at org.docx4j.samples.PublishingPOC.convertToPdf(PublishingPOC.java:57)
at org.docx4j.samples.PublishingPOC.main(PublishingPOC.java:45)
Caused by: javax.xml.transform.TransformerException: org.apache.fop.fo.ValidationException: "fo:table-row" is missing child elements.
Required content model: (table-cell+) (See position 17:189)
at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:501)
at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:333)
... 2 more
Caused by: org.apache.fop.fo.ValidationException: "fo:table-row" is missing child elements.
Required content model: (table-cell+) (See position 17:189)
at org.apache.fop.events.ValidationExceptionFactory.createException(ValidationExceptionFactory.java:38)
at org.apache.fop.events.EventExceptionManager.throwException(EventExceptionManager.java:54)
at org.apache.fop.events.DefaultEventBroadcaster$1.invoke(DefaultEventBroadcaster.java:152)
at $Proxy37.missingChildElement(Unknown Source)
at org.apache.fop.fo.FONode.missingChildElementError(FONode.java:550)
at org.apache.fop.fo.flow.table.TableRow.finalizeNode(TableRow.java:115)
at org.apache.fop.fo.FONode.endOfNode(FONode.java:329)
at org.apache.fop.fo.flow.table.TableRow.endOfNode(TableRow.java:108)
at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:348)
at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:177)
at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1101)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601)
at com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:180)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:377)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2755)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:484)
... 3 more

Re: error when try to convert docx which has tables to pdf

PostPosted: Tue Aug 24, 2010 12:25 pm
by jason
Thanks for reporting this. This is caused by an sdt (content control) in the place of a row.

It is fixed in http://dev.plutext.org/trac/docx4j/changeset/1188

If you drop in http://dev.plutext.org/docx4j/docx4j-ni ... 100824.jar in place of the 2.5.0 jar, you should find it works.

Re: error when try to convert docx which has tables to pdf

PostPosted: Tue Aug 24, 2010 3:58 pm
by bzha005
Hi Jason,
Thanks for your reply. it works. However, there are still following issues.
I used invoice.docx as template to generate my invoice_bound.docx by replacing the custom xml file programmely, I got the expect docx file with 3 rows but after I coverted invoice_bound.docx to pdf I got one row pdf and table border is gone. I have zipped and attached the files for your reference.

Re: error when try to convert docx which has tables to pdf

PostPosted: Tue Aug 24, 2010 4:53 pm
by jason
1. http://dev.plutext.org/trac/docx4j/changeset/1189 handles multiple tr in the sdt.

I looked at your docx and the resulting pdf.

2. Looks like an enhancement is required before docx4j supports your shaded row.

3. Also, I notice that the structure:
Code: Select all
       <w:p w:rsidRDefault="002E7764" w:rsidP="00926DAD" w:rsidR="00926DAD">
          <w:r>
            <w:t>Joe Bloggs</w:t>
          </w:r>
          <w:proofErr w:type="spellStart"/>
          <w:r>
            <w:t>Joe Bloggs</w:t>
          </w:r>
          <w:proofErr w:type="spellEnd"/>
        </w:p>


is resulting in Joe Bloggs appearing twice in the pdf. I will look into fixing this proofErr in the next day or so. In the meantime, the workaround would be to ensure you document doesn't contain any of these.

------------
Did you see the link to the docx4j user survey? Please take the survey, if you haven't already done so. Thanks!

Re: error when try to convert docx which has tables to pdf

PostPosted: Wed Aug 25, 2010 10:08 am
by bzha005
Hi Jason,

I found some more related errors. there are 2 small word files in the attached zip file. I opened the simple-test.doc in word 2007 and saved it as simple-test.docx, then tried to convert simple-test.docx to pdf using docx4j and I got the following error.

Exception in thread "main" org.docx4j.openpackaging.exceptions.Docx4JException: FOP issues
at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:347)
at org.docx4j.samples.PublishingPOC.convertToPdf(PublishingPOC.java:72)
at org.docx4j.samples.PublishingPOC.main(PublishingPOC.java:50)
Caused by: javax.xml.transform.TransformerException: org.apache.fop.fo.ValidationException: "fo:table-cell" is missing child elements.
Required content model: marker* (%block;)+ (See position 3:286)
at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:501)
at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:339)
... 2 more
Caused by: org.apache.fop.fo.ValidationException: "fo:table-cell" is missing child elements.
Required content model: marker* (%block;)+ (See position 3:286)
at org.apache.fop.events.ValidationExceptionFactory.createException(ValidationExceptionFactory.java:38)
at org.apache.fop.events.EventExceptionManager.throwException(EventExceptionManager.java:54)
at org.apache.fop.events.DefaultEventBroadcaster$1.invoke(DefaultEventBroadcaster.java:152)
at $Proxy37.missingChildElement(Unknown Source)
at org.apache.fop.fo.FONode.missingChildElementError(FONode.java:564)
at org.apache.fop.fo.flow.table.TableCell.finalizeNode(TableCell.java:113)
at org.apache.fop.fo.FONode.endOfNode(FONode.java:329)
at org.apache.fop.fo.flow.table.TableCell.endOfNode(TableCell.java:105)
at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:348)
at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:177)
at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1101)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2938)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:484)
... 3 more

Re: error when try to convert docx which has tables to pdf

PostPosted: Wed Aug 25, 2010 3:49 pm
by jason
bzha005 wrote: tried to convert simple-test.docx to pdf using docx4j and I got the following error.


Works for me (ie no FOP error) using current svn tip.

Note that your document contains fields, in particular to set and reference variables. docx4j PDF output does not honour fields, except for page number fields.

Support for those could be added, or you could consider using the data binding stuff instead.

Re: error when try to convert docx which has tables to pdf

PostPosted: Fri Aug 27, 2010 4:48 pm
by jason
jason wrote:is resulting in Joe Bloggs appearing twice in the pdf. I will look into fixing this proofErr in the next day or so. In the meantime, the workaround would be to ensure you document doesn't contain any of these.


Looking at this more closely, things seem to be ok as they are.

The original Word document had:

Code: Select all
      <w:p w:rsidRDefault="002E7764" w:rsidP="00926DAD" w:rsidR="00926DAD">
          <w:r>
            <w:t>Joe Bloggs</w:t>
          </w:r>
          <w:proofErr w:type="spellStart"/>
          <w:r>
            <w:t>Joe Bloggs</w:t>
          </w:r>
          <w:proofErr w:type="spellEnd"/>
        </w:p>


When opened in Word, it showed only one "Joe Bloggs". This isn't because the second is surrounded by w:proofErr, but rather, because the binding is performed, replacing the paragraph contents (with the bound XML, which in this case was also Joe Bloggs).

So the docx4j pdf output should produce the same result, providing the binding is resolved first.

Re: error when try to convert docx which has tables to pdf

PostPosted: Tue Jan 08, 2013 6:04 pm
by ceugster
Hi, I add my post to this thread because I met the same issues with the current versions 2.8.1 and 2.9.0-SNAPSHOT and the proposed work arounds did not help (invalid link). Maybe you have a solution to my problem: I tried to convert bypacked docx file to pdf. The docx file was constructed from a template by a tool named OfficeAtWork. The thrown exception is:

org.docx4j.openpackaging.exceptions.Docx4JException: FOP issues
at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:378)
at ch.eugster.word2pdf.Main.main(Main.java:33)
Caused by: javax.xml.transform.TransformerException: org.apache.fop.fo.ValidationException: "fo:table-cell" is missing child elements. Required content model: marker* (%block;)+ (Siehe Position 73:1316)
at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:502)
at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:366)
... 1 more
Caused by: org.apache.fop.fo.ValidationException: "fo:table-cell" is missing child elements. Required content model: marker* (%block;)+ (Siehe Position 73:1316)
at org.apache.fop.events.ValidationExceptionFactory.createException(ValidationExceptionFactory.java:38)
at org.apache.fop.events.EventExceptionManager.throwException(EventExceptionManager.java:58)
at org.apache.fop.events.DefaultEventBroadcaster$1.invoke(DefaultEventBroadcaster.java:175)
at $Proxy37.missingChildElement(Unknown Source)
at org.apache.fop.fo.FONode.missingChildElementError(FONode.java:589)
at org.apache.fop.fo.flow.table.TableCell.finalizeNode(TableCell.java:116)
at org.apache.fop.fo.FONode.endOfNode(FONode.java:330)
at org.apache.fop.fo.flow.table.TableCell.endOfNode(TableCell.java:108)
at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:347)
at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:181)
at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1102)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485)
... 2 more

Is there a solution for this?

Thank you!

Christian

Re: error when try to convert docx which has tables to pdf

PostPosted: Tue Jan 08, 2013 8:31 pm
by jason
Thanks for reporting this. The problem was a bookmarkStart between table rows.

https://github.com/plutext/docx4j/commi ... 0f8a16301f fixes this issue.

I notice your document uses a variety of fields liberally. I'm afraid the PDF output doesn't handle these at present. I think one of the other docx4j developers is planning to do some work on this, so this might change relatively soon, but no promises.

Re: error when try to convert docx which has tables to pdf

PostPosted: Wed Jan 09, 2013 5:02 pm
by ceugster
Hi Jason,

thank you for the quick response!

Christian