Page 1 of 1

PDF Conversion Header issues

PostPosted: Wed Dec 18, 2013 7:06 am
by todd
I'm using docx4j 3.0.0 with Word 2010 and trying to convert a document to pdf using the following sample code, ConvertPDF
Code: Select all
public boolean convert(WordprocessingMLPackage wMLP) throws Docx4JException {

      // Font regex (optional)
      // Set regex if you want to restrict to some defined subset of fonts
      // Here we have to do this before calling createContent,
      // since that discovers fonts
      // String regex = null;
      // Windows:
      String regex = ".*(calibri|cour|arial|times|comic|georgia|impact|LSANS|pala|tahoma|trebuc|verdana|symbol|webdings|wingding).*";
      PhysicalFonts.setRegex(regex);

      // Document loading (required)
      // Set up font mapper (optional)
         
      Mapper fontMapper = new IdentityPlusMapper();
      try {
         wMLP.setFontMapper(fontMapper);
      } catch (Exception e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
      }
         
      // .. example of mapping missing font Algerian to installed font
      // Comic
      // Sans MS
      PhysicalFont font = PhysicalFonts.getPhysicalFonts().get(
            "Times New Roman");
      fontMapper.getFontMappings().put("Calibri", font);
         
      // New code
      FOSettings foSettings = Docx4J.createFOSettings();
      if (saveFO) {
         foSettings.setFoDumpFile(new java.io.File(DIR_OUT + inputfile + ".fo"));
      }
      foSettings.setWmlPackage(wMLP);

      // exporter writes to an OutputStream.
      try {
         String outputFile = FilenameUtils.removeExtension(inputfile) + ".pdf";
         OutputStream os = new java.io.FileOutputStream(DIR_OUT + outputFile);
         // Don't care what type of exporter you use
         Docx4J.toFO(foSettings, os, Docx4J.FLAG_NONE);
         System.out.println("Saved " + DIR_OUT + outputFile);
      } catch (FileNotFoundException e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
      }
      // end new code

      return false;
   }

When I create a simple Word document using Word and docx4j programmatically, the Header information is truncated into the document body section during conversion.
This appears to only happen with docx4j 3.0.0.
I ran the online PDF conversion tool on the document header looks ok.
I ran the ConvertPDF sample code for docx4j 2.8 and the document header looks ok.
It appears only the latest version of docx4j 3.0.0 is causing the issue.

Enclosed are source doc and sample converted pdf documents.
Please advise
Thanks

Re: PDF Conversion Header issues HELP

PostPosted: Fri Dec 20, 2013 6:01 am
by todd
Jason,
There appears to be an issue with docx4j 3.0 when converting docx documents to pdf.
The sample I inclosed shows the conversion of the pdf with the main document starting at a fixed location after the headers. Which truncates the header into the main document contents. Is there a way to override this?
The FOSettings class is referencing apacheFopxxx configurations, where does docx use these default configurations?
This is a critical path in our development as we have already created documents using docx4j now we need a pdf converter and wanted to use docx4j to achieve this. I read in one of your other forums, you were not supporting fixes for pdf conversions.
I can use word to convert the same document to pdf just fine but docx4j is not.
Any input would be appreciated.

Thanks

Re: PDF Conversion Header issues

PostPosted: Thu Mar 06, 2014 9:39 pm
by korinna
Hi,
I have the same issue.
It's a serious problem for me. :(
Could someone please help me?

Thanks....

Re: PDF Conversion Header issues

PostPosted: Fri Mar 21, 2014 10:42 am
by todd
Any news on a resolution, this has become a critical path for use too. At least an alternative solution?
Especially after we purchased the Plutext-Enterprise 3.0.1.0 edition
Thanks

Re: PDF Conversion Header issues

PostPosted: Fri Mar 21, 2014 12:45 pm
by jason
Please see https://github.com/plutext/docx4j/raw/m ... oters.docx for an overview of the issues involved,
and especially line 221 and following of https://github.com/plutext/docx4j/blob/ ... ilder.java

The issue here is a difference in models between Word/docx on the one hand, and XSL FO (for PDF) on the other.

A simple brute force fix individual users could try might be to alter some of the constants. This might be sufficient if your documents always use the same headers. But a more general improvement would be desirable.

Todd, as an Enterprise customer please raise a support incident via support@plutext.com for attention to this.

In the absence of improvements to docx4j in this area, one could use LibreOffice for PDF conversion (via JODConverter), or one of the many cloud based conversion services.

Re: PDF Conversion Header issues

PostPosted: Sat Mar 22, 2014 10:16 am
by todd
Thanks Jason, I will take a look

Re: PDF Conversion Header issues

PostPosted: Sat Mar 22, 2014 10:53 am
by jason
Todd fyi prompted by your post i'm also revisiting this now.

Re: PDF Conversion Header issues

PostPosted: Mon Mar 24, 2014 12:52 pm
by jason
Please try (the newer than 3.0.1 nightly) http://www.docx4java.org/docx4j/docx4j- ... 140324.jar

Re: PDF Conversion Header issues

PostPosted: Wed Mar 26, 2014 11:09 am
by todd
Jason, this fix looks good.
What is the procedure to get a new Plutext-Enterprise build with these changes?
Thanks again.

Re: PDF Conversion Header issues

PostPosted: Thu Mar 27, 2014 3:13 pm
by jason
Hi Todd, thanks for the feedback.

The Enterprise Edition sits on top of a docx4j release. In the case of this fix, what is required is just a new docx4j release included on your class path. I'm planning a release incorporating this fix in the next week or so. Were that not the case, you could request a point release via the support channel.

cheers .. Jason