Page 1 of 1

docx to pdf page numbers

PostPosted: Fri Aug 06, 2010 5:07 pm
by bizzy
I'm using docx4j to convert docx files to PDFs.

Page numbers in the footer are not being rendered correctly though.
Suppose I have a 3 page docx document: each page in the generated pdf file will have the same page number, and this page number is either 1 or the max page number.
I'm using the org.docx4j.convert.out.pdf.viaXSLFO.Conversion converter - overall it produces really good results.

Any help or pointers resolving this would be very much appreciated.

Thanks.

Sample code...


import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

import org.docx4j.openpackaging.packages.WordprocessingMLPackage;


public class DocxToPdfTest {

public static void test() throws Exception {
String docxFile = "test-file-with-page-numbers-r.docx";

WordprocessingMLPackage pkg = WordprocessingMLPackage.load(new File(docxFile));
org.docx4j.convert.out.pdf.PdfConversion c
= new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(pkg);
// = new org.docx4j.convert.out.pdf.viaIText.Conversion(pkg);

String pdfFile = changeExtensionToPdf(docxFile);
OutputStream os = new FileOutputStream(pdfFile);
c.output(os);
os.close();

System.out.println("Finished");
}

private static String changeExtensionToPdf(String path) {
int markerIndex = path.lastIndexOf(".docx");
String pdfFile = path.substring(0, markerIndex) + ".pdf";
return pdfFile;
}

public static void main(String[] args) {
try {
DocxToPdfTest.test();
} catch (Exception e) {
e.printStackTrace();
}
}
}

Re: docx to pdf page numbers

PostPosted: Sat Aug 07, 2010 8:28 am
by jason
Hi

I just added support for this, see http://dev.plutext.org/trac/docx4j/changeset/1182

I'll prob upload a nightly build sometime over the weekend.

cheers .. Jason

Re: docx to pdf page numbers

PostPosted: Mon Aug 09, 2010 12:24 pm
by bizzy
Hi Jason

Thanks for getting around to that so quickly.
I got your latest updates from subversion.

Page numbers are still not being generated correctly though.
All page numbers are now being set to the max page number.


Thanks

Re: docx to pdf page numbers

PostPosted: Mon Aug 09, 2010 5:45 pm
by jason
You might want to check that the updates in the patch were correctly applied to your working copy.

If yes, please post a test case showing the problem.

It should work, assuming you are using:

Code: Select all
        <w:fldSimple w:instr=" PAGE   \* MERGEFORMAT ">
          <w:r>
            <w:rPr>
              <w:noProof/>
            </w:rPr>
            <w:t>J</w:t>
          </w:r>
        </w:fldSimple>


Certain number formats aren't supported, but that doesn't explain your problem.

I tested it in the header part, but it should also work in the main document part and the footer part.

.. Jason

Re: docx to pdf page numbers

PostPosted: Wed Aug 11, 2010 5:58 pm
by bizzy
Yeah, the updates were applied correctly. I'm working directly off the source and debugged into some of the new code you added in the Conversion class.

...and now to show my lack of knowledge about docx...
I printed the xml from the docx file. I couldn't see the xml you mention below, but I'm basically only guessing where to look for it.
The xml is very different to your xml in your last post. I've pasted in the xml from the word/footer1.xml below. This is from a .docx file from ms-word 2003.

I cant attach .docx files to posts, so no sample doc.

Thanks

Code: Select all
<w:ftr >
    <w:p w:rsidR="00CA5C13" w:rsidRDefault="00CA5C13" w:rsidP="001C6179">
        <w:pPr >
            <w:pStyle w:val="Footer">
            </w:pStyle>
            <w:framePr w:wrap="around" w:vAnchor="text" w:hAnchor="margin" w:xAlign="right" w:y="1">
            </w:framePr>
            <w:rPr >
                <w:rStyle w:val="PageNumber">
                </w:rStyle>
            </w:rPr>
        </w:pPr>
        <w:r >
            <w:rPr >
                <w:rStyle w:val="PageNumber">
                </w:rStyle>
            </w:rPr>
            <w:fldChar w:fldCharType="begin">
            </w:fldChar>
        </w:r>
        <w:r >
            <w:rPr >
                <w:rStyle w:val="PageNumber">
                </w:rStyle>
            </w:rPr>
            <w:instrText xml:space="preserve">
                PAGE 
            </w:instrText>
        </w:r>
        <w:r >
            <w:rPr >
                <w:rStyle w:val="PageNumber">
                </w:rStyle>
            </w:rPr>
            <w:fldChar w:fldCharType="end">
            </w:fldChar>
        </w:r>
    </w:p>
    <w:p w:rsidR="00CA5C13" w:rsidRDefault="00CA5C13" w:rsidP="00127C57">
        <w:pPr >
            <w:pStyle w:val="Footer">
            </w:pStyle>
            <w:ind w:right="360">
            </w:ind>
        </w:pPr>
    </w:p>
</w:ftr>

Re: docx to pdf page numbers

PostPosted: Thu Aug 12, 2010 7:05 am
by jason
ok, that looks pretty straightforward.

you can post a sample docx, provided you first rename it to .zip. could you please do that?

i'll incorporate support for it soon after seeing the sample docx. thanks .. Jason

Re: docx to pdf page numbers

PostPosted: Thu Aug 12, 2010 11:10 am
by bizzy
Thanks Jason.

I've renamed and attached the docx file.
I just added .zip onto the end of the filename. So the real file name should be kb-test-22-footer-page-number-only-r.docx

Thanks again.

Re: docx to pdf page numbers

PostPosted: Fri Aug 13, 2010 8:49 pm
by jason
OK its done, see http://dev.plutext.org/trac/docx4j/changeset/1185

Note that your document right aligns the numbers, using:

Code: Select all
<w:framePr w:wrap="around" w:vAnchor="text" w:hAnchor="margin" w:xAlign="right" w:y="1"/>


That w:xAlign is ignored in the PDF output. As a workaround, you could replace this w:framePr with the justification paragraph property.

Re: docx to pdf page numbers

PostPosted: Mon Aug 16, 2010 11:07 am
by bizzy
Page numbers are looking good.

Thanks a lot Jason

Re: docx to pdf page numbers

PostPosted: Mon Aug 16, 2010 6:05 pm
by jason
No worries :-)