Page 1 of 1

Docx4j: Convert to PDF, formatting problems

PostPosted: Wed Jun 04, 2014 10:14 pm
by AndyRandy
Hello

I am writing a Java application which should convert a DOCX to a PDF file. Unfortunately, the output file ignores certain formats from the docx-file such as:

1. centered header image (jpg), output is left aligned
2. columns from docx are ignored, pdf writes text underneath

I am using these lines of code for conversion:

Code: Select all
File pdffile = new File(temp+"/_"+FilenameUtils.removeExtension(file.getName())+".pdf");
OutputStream os = null;
WordprocessingMLPackage document = WordprocessingMLPackage.load(file);
Mapper fontMapper = new IdentityPlusMapper();
document.setFontMapper(fontMapper);
File folder = Systemkit.createRandomDirectory("tmp/files/");
FOSettings fo = Docx4J.createFOSettings();
fo.setFoDumpFile(new File(folder.getAbsolutePath()+"/"+"report.fo"));
fo.setWmlPackage(document);
os = new FileOutputStream(pdffile);
Docx4J.toFO(fo, os, Docx4J.FLAG_EXPORT_PREFER_XSL);


Is there a possibility to handle these formats either by changing the input file or changing the pdf converter options?

Thanks for your help.

Re: Docx4j: Convert to PDF, formatting problems

PostPosted: Thu Jun 05, 2014 12:37 am
by jason
AndyRandy wrote:1. centered header image (jpg), output is left aligned


Your image is absolutely positioned:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
      <w:drawing>
        <wp:anchor distT="0" distB="0" distL="114300" distR="114300" simplePos="0" relativeHeight="251659264"
                  behindDoc="0" locked="0" layoutInCell="1" allowOverlap="1">
          <wp:simplePos x="0" y="0"/>
          <wp:positionH relativeFrom="page">
            <wp:posOffset>2682240</wp:posOffset>
          </wp:positionH>
          <wp:positionV relativeFrom="paragraph">
            <wp:posOffset>-71120</wp:posOffset>
          </wp:positionV>
          <wp:extent cx="2753995" cy="524510"/>
          <wp:effectExtent l="0" t="0" r="8255" b="8890"/>
          <wp:wrapSquare wrapText="bothSides"/>
 
Parsed in 0.002 seconds, using GeSHi 1.0.8.4


We have some support for floating text boxes in FOPictWriterAbstract, but not images.

Some support for floating images would be possible, but (as with text boxes) it would be limited by XSL FO particularly with FOP.

You could try using a table to centre the image.

AndyRandy wrote:columns from docx are ignored, pdf writes text underneath


We don't support that yet. To track it, I've added https://github.com/plutext/docx4j/issues/117

Re: Docx4j: Convert to PDF, formatting problems

PostPosted: Thu Jun 05, 2014 2:00 am
by AndyRandy
Thanks for your reply.

In this case, I have to adjust my docx files for now. Looking forward to see this support in a future release.