Page 1 of 1

tabs problem during docx2pdf conversion

PostPosted: Thu Nov 29, 2012 1:52 am
by salocinx
I'm trying to convert a WordprocessingMLPackage instance to pdf format. This is the code I am using:

Code: Select all
WordprocessingMLPackage document = WordprocessingMLPackage.load(new FileInputStream(new File("c:/report.docx")));
Mapper fontMapper = new IdentityPlusMapper();
document.setFontMapper(fontMapper);
File fo = new File("c:/report.fo");
File pdf = new File("c:/report.pdf");
org.docx4j.convert.out.pdf.PdfConversion converter = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(document);
((org.docx4j.convert.out.pdf.viaXSLFO.Conversion)converter).setSaveFO(fo);
os = new FileOutputStream(pdf);
converter.output(os, new PdfSettings());


Basically the conversion works. But there are two problems I face:

1. the tabulator within the pdf file do not conform with the tabulator set in the corresponding MS word file

2. during conversion I get the following error:

Code: Select all
*ERROR* NamespacePrefixMapperUtils: name: com.sun.xml.internal.bind.namespacePrefixMapper value: org.docx4j.jaxb.NamespacePrefixMapperSunInternal@2f80b005 .. trying RI. (NamespacePrefixMapperUtils.java, line 63)


Are these two issues somehow linked ?
Any ideas how to solve these issues?

I attached the report.docx (source) and the report.pdf (target) files.

report.docx
report.docx
(14.07 KiB) Downloaded 427 times

report.pdf
report.pdf
(35.1 KiB) Downloaded 475 times


Thank you for your help in advance!

Re: tabs problem during docx2pdf conversion

PostPosted: Thu Nov 29, 2012 8:04 am
by jason
For tabs, the XSLT just does:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
<xsl:template match="w:tab">
    <xsl:call-template name="OutputTlcChar">
      <xsl:with-param name="tlc">
        <xsl:text disable-output-escaping="yes">&#160;</xsl:text>
      </xsl:with-param>
      <xsl:with-param name="count" select="3"/>
    </xsl:call-template>
</xsl:template>

  <xsl:template name="OutputTlcChar">
    <xsl:param name="count" select="0"/>
    <xsl:param name="tlc" select="' '"/>
    <xsl:value-of select="$tlc"/>
    <xsl:if test="$count > 1">
      <xsl:call-template name="OutputTlcChar">
        <xsl:with-param name="count" select="$count - 1"/>
        <xsl:with-param name="tlc" select="$tlc"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>
 
Parsed in 0.002 seconds, using GeSHi 1.0.8.4


This is because XSL FO has no support for tabs. That might change; see this 2.0 working draft: http://www.w3.org/TR/2012/WD-xslfo20-20 ... #tab-stops

Microsoft has some recommendations on how to avoid relying on tabs, at http://msdn.microsoft.com/en-us/library ... =office.11).aspx

Many Office users use the TAB key to alter spacing, create indentations, or create table structures. While this method is acceptable with Word, there is no easy way to reproduce the same behavior with XSL-FO because it lacks the equivalent of a tab mark. Although the Word2FO.xsl style sheet provides some methods to represent tab stops, the resulting output may not look exactly like the original Word document. Users are encouraged to use different formatting techniques instead of tab marks. For example:

To create first-line indents or negative indents in paragraphs. Replace tab marks by using explicit indentation.

To create table structures. Replace tab marks with explicit tables.

To create dotted rules in a table of contents, fill-in form fields, and so on. You can approximate this by applying a specific Underline style to a sequence of spaces, though this lacks the "stretch ability" found with the dotted rules in a table of contents formatted using tab marks.


Here are three posts from the FOP mailing lists which are relevant:

June 2012, some discussion of the problem http://www.junlu.com/list/72/1037851.html

May 2009, with an extension function: http://apache-fop.1065347.n5.nabble.com ... 15820.html

Nov 2006, an XSLT implementation http://mail-archives.apache.org/mod_mbo ... dora.be%3E

*ERROR* NamespacePrefixMapperUtils


This is a different unrelated issue. Please start a separate thread, and explain which version of docx4j you are using, which java (output of java -version), and which JAXB.

Re: tabs problem during docx2pdf conversion

PostPosted: Fri Nov 30, 2012 8:22 pm
by salocinx
Hi jason, many thanks for the fast reply. I've choosen the table approach you suggested and it fits my needs :-) If the other problem persists, I'll open a new thread.