Page 1 of 1

Page breaks in pdf's

PostPosted: Wed Jun 20, 2012 9:13 pm
by jordi
I have a template in .docx format which I want to fill with data and convert to pdf.
I can see that the break tags (<w:br w:type="page"/>) are present in the docx file after the opendope and binding process, but they never appear in the pdf.

Is this some kind of limitation or simply it's a bug?

Thanks.

Re: Page breaks in pdf's

PostPosted: Wed Jun 20, 2012 10:21 pm
by jason
It is a bug.

src/main/java/org/docx4j/convert/out/pdf/viaXSLFO/Conversion.java invokes PageBreak.movePageBreaks

This should convert:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
    <w:p>
      <w:r>
        <w:br w:type="page"/>
      </w:r>
    </w:p>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


to:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
    <w:p >
      <w:pPr>
        <w:pageBreakBefore/>
      </w:pPr>
    </w:p>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


The reason:
Code: Select all
    * If a page-break w:br w:type="page" is found within a run with some formatting applied to it
    * then it will be generated into an fo:inline tag. This page break will be ignored by fop. This class
    * moves the page-breaks to the enclosing block.


Either this isn't happening in your case, or the effective pPr is being calculated as something else.

Could you please post the XML for entire w:p containing your w:br?

Re: Page breaks in pdf's

PostPosted: Wed Jun 20, 2012 10:58 pm
by jordi
Sure. How can I get this xml structure from the pdf document?

I can give you the code of the .fo export:
Code: Select all
<fo:block color="#548DD4" font-family="Calibri" font-size="11.0pt" line-height="115%" space-after="4mm">
  <inline xmlns="http://www.w3.org/1999/XSL/Format" color="#548DD4" font-family="Calibri" font-size="11.0pt">
    <block break-before="page"/>
  </inline>
</fo:block>


And the code from the word export:
Code: Select all
<w:p w:rsidRDefault="00E13AD8" w:rsidR="00E13AD8">
  <w:pPr>
    <w:rPr>
      <w:color w:val="548DD4" w:themeColor="text2" w:themeTint="99"/>
      <w:lang w:val="en-US"/>
    </w:rPr>
  </w:pPr>
  <w:r>
    <w:rPr>
      <w:color w:val="548DD4" w:themeColor="text2" w:themeTint="99"/>
      <w:lang w:val="en-US"/>
    </w:rPr>
    <w:br w:type="page"/>
  </w:r>
</w:p>


Both seem to be correct. Thanks.

Re: Page breaks in pdf's

PostPosted: Wed Jun 20, 2012 11:30 pm
by jason
I took the w:p you posted, and ran a docx containing that through PDF export.

It correctly converted it to w:pPr/w:pageBreakBefore, and then to fo:block with @break-before="page"

The only way I can see to get your result, is if PageBreak class isn't processing your w:p correctly.

It only processes top level paragraphs (ie not those in a content control, or a table etc). Does this explain your case?

Re: Page breaks in pdf's

PostPosted: Thu Jun 21, 2012 1:49 am
by jordi
It only processes top level paragraphs (ie not those in a content control, or a table etc). Does this explain your case?


Yes. Thank you. I thought I was going crazy with this.
I can manage to mostly fix the template by doing the breaks outside the controls, but there is one case it cannot be done, as I have a control that includes some pages, each one with its title bar.
Can the breaks be done inside the controls somehow? Calling some other method or something?

I can attach the template if you need to see it. Thanks.

Re: Page breaks in pdf's

PostPosted: Thu Jun 21, 2012 10:40 pm
by jason
The PageBreak class should be fixed in order to process w:p which are inside a content control. I've created https://github.com/plutext/docx4j/issues/9 to track this issue.

Until this is done, you can work around the problem by stripping the content controls after the binding is done (and before you create the PDF, obviously) using RemovalHandler. See https://github.com/plutext/docx4j/blob/ ... sions.java for example of usage.

Re: Page breaks in pdf's

PostPosted: Fri Jun 22, 2012 12:27 am
by jordi
Many thanks. Will do as in the example. :)