Plutext

Posted: **Wed Dec 18, 2013 7:06 am**

I'm using docx4j 3.0.0 with Word 2010 and trying to convert a document to pdf using the following sample code, ConvertPDF

Code: Select all: public boolean convert(WordprocessingMLPackage wMLP) throws Docx4JException { // Font regex (optional) // Set regex if you want to restrict to some defined subset of fonts // Here we have to do this before calling createContent, // since that discovers fonts // String regex = null; // Windows: String regex = ".*(calibri|cour|arial|times|comic|georgia|impact|LSANS|pala|tahoma|trebuc|verdana|symbol|webdings|wingding).*"; PhysicalFonts.setRegex(regex); // Document loading (required) // Set up font mapper (optional) Mapper fontMapper = new IdentityPlusMapper(); try { wMLP.setFontMapper(fontMapper); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } // .. example of mapping missing font Algerian to installed font // Comic // Sans MS PhysicalFont font = PhysicalFonts.getPhysicalFonts().get( "Times New Roman"); fontMapper.getFontMappings().put("Calibri", font); // New code FOSettings foSettings = Docx4J.createFOSettings(); if (saveFO) { foSettings.setFoDumpFile(new java.io.File(DIR_OUT + inputfile + ".fo")); } foSettings.setWmlPackage(wMLP); // exporter writes to an OutputStream. try { String outputFile = FilenameUtils.removeExtension(inputfile) + ".pdf"; OutputStream os = new java.io.FileOutputStream(DIR_OUT + outputFile); // Don't care what type of exporter you use Docx4J.toFO(foSettings, os, Docx4J.FLAG_NONE); System.out.println("Saved " + DIR_OUT + outputFile); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } // end new code return false; }

When I create a simple Word document using Word and docx4j programmatically, the Header information is truncated into the document body section during conversion.
This appears to only happen with docx4j 3.0.0.
I ran the online PDF conversion tool on the document header looks ok.
I ran the ConvertPDF sample code for docx4j 2.8 and the document header looks ok.
It appears only the latest version of docx4j 3.0.0 is causing the issue.

Enclosed are source doc and sample converted pdf documents.
Please advise
Thanks

Posted: **Fri Dec 20, 2013 6:01 am**

Jason,
There appears to be an issue with docx4j 3.0 when converting docx documents to pdf.
The sample I inclosed shows the conversion of the pdf with the main document starting at a fixed location after the headers. Which truncates the header into the main document contents. Is there a way to override this?
The FOSettings class is referencing apacheFopxxx configurations, where does docx use these default configurations?
This is a critical path in our development as we have already created documents using docx4j now we need a pdf converter and wanted to use docx4j to achieve this. I read in one of your other forums, you were not supporting fixes for pdf conversions.
I can use word to convert the same document to pdf just fine but docx4j is not.
Any input would be appreciated.

Thanks

Posted: **Thu Mar 06, 2014 9:39 pm**

Hi,
I have the same issue.
It's a serious problem for me.

Could someone please help me?

Thanks....

Posted: **Fri Mar 21, 2014 10:42 am**

Any news on a resolution, this has become a critical path for use too. At least an alternative solution?
Especially after we purchased the Plutext-Enterprise 3.0.1.0 edition
Thanks

Posted: **Fri Mar 21, 2014 12:45 pm**

Please see https://github.com/plutext/docx4j/raw/m ... oters.docx for an overview of the issues involved,
and especially line 221 and following of https://github.com/plutext/docx4j/blob/ ... ilder.java

The issue here is a difference in models between Word/docx on the one hand, and XSL FO (for PDF) on the other.

A simple brute force fix individual users could try might be to alter some of the constants. This might be sufficient if your documents always use the same headers. But a more general improvement would be desirable.

Todd, as an Enterprise customer please raise a support incident via support@plutext.com for attention to this.

In the absence of improvements to docx4j in this area, one could use LibreOffice for PDF conversion (via JODConverter), or one of the many cloud based conversion services.

Posted: **Sat Mar 22, 2014 10:16 am**

Thanks Jason, I will take a look

Posted: **Sat Mar 22, 2014 10:53 am**

Todd fyi prompted by your post i'm also revisiting this now.

Posted: **Mon Mar 24, 2014 12:52 pm**

Please try (the newer than 3.0.1 nightly) http://www.docx4java.org/docx4j/docx4j- ... 140324.jar

Posted: **Wed Mar 26, 2014 11:09 am**

Jason, this fix looks good.
What is the procedure to get a new Plutext-Enterprise build with these changes?
Thanks again.

Posted: **Thu Mar 27, 2014 3:13 pm**

Hi Todd, thanks for the feedback.

The Enterprise Edition sits on top of a docx4j release. In the case of this fix, what is required is just a new docx4j release included on your class path. I'm planning a release incorporating this fix in the next week or so. Were that not the case, you could request a point release via the support channel.

cheers .. Jason

Plutext

PDF Conversion Header issues

PDF Conversion Header issues

Re: PDF Conversion Header issues HELP

Re: PDF Conversion Header issues

Re: PDF Conversion Header issues

Re: PDF Conversion Header issues

Re: PDF Conversion Header issues

Re: PDF Conversion Header issues

Re: PDF Conversion Header issues

Re: PDF Conversion Header issues

Re: PDF Conversion Header issues