Page 1 of 1

docx to PDF conversion issues.

PostPosted: Fri Sep 05, 2014 6:23 pm
by MohanInturi
Hi Jason,

PFA, I am trying to convert docx to pdf file using the example ConvertOutPDF.java, But pdf generated is applying too much spaces between existing spaces in docx and also chinese characters also shown as #. Is there any thing i am missing here?

docbookmark12.docx
(29.43 KiB) Downloaded 385 times


docbookmark12.pdf
(26.67 KiB) Downloaded 458 times


ConvertOutPDF.java
(5.6 KiB) Downloaded 490 times

Re: docx to PDF conversion issues.

PostPosted: Fri Sep 05, 2014 10:27 pm
by jason
What version of docx4j?

If not 3.2.0, please try that.

Re: docx to PDF conversion issues.

PostPosted: Fri Sep 05, 2014 10:39 pm
by MohanInturi
Thanks, But i was using 3.2.0 version only.

shot.jpg
shot.jpg (271.75 KiB) Viewed 2797 times

Re: docx to PDF conversion issues.

PostPosted: Sat Sep 06, 2014 3:26 pm
by jason
As far as the Chinese is concerned, you have:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
    <w:p w:rsidR="007A778C" w:rsidRDefault="007A778C">
      <w:pPr>
        <w:pStyle w:val="DocumentTemplateInformationBox"/>
        <w:keepNext/>
      </w:pPr>
      <w:r>
        <w:t>simplified Chinese one: 汉字</w:t>
      </w:r>
    </w:p>

          <w:p w:rsidR="00FE29DB" w:rsidRDefault="00FE29DB" w:rsidP="00023731">
            <w:pPr>
              <w:pStyle w:val="DocumentTemplateType"/>
              <w:rPr>
                <w:rFonts w:ascii="GE Inspira" w:hAnsi="GE Inspira"/>
                <w:b w:val="0"/>
                <w:bCs/>
                <w:color w:val="auto"/>
                <w:sz w:val="28"/>
              </w:rPr>
            </w:pPr>

            <w:r w:rsidRPr="00353DF0">
              <w:rPr>
                <w:rFonts w:ascii="GE Inspira" w:hAnsi="GE Inspira"/>
                <w:color w:val="auto"/>
              </w:rPr>
              <w:t xml:space="preserve">simplified Chinese two: </w:t>
            </w:r>
            <w:r w:rsidRPr="00353DF0">
              <w:rPr>
                <w:rFonts w:ascii="MingLiU" w:eastAsia="MingLiU" w:hAnsi="MingLiU" w:cs="MingLiU" w:hint="eastAsia"/>
                <w:color w:val="auto"/>
              </w:rPr>
              <w:t></w:t>
            </w:r>
            <w:r w:rsidRPr="00353DF0">
              <w:rPr>
                <w:rFonts w:ascii="MS Gothic" w:eastAsia="MS Gothic" w:hAnsi="MS Gothic" w:cs="MS Gothic" w:hint="eastAsia"/>
                <w:color w:val="auto"/>
              </w:rPr>
              <w:t></w:t>
            </w:r>

          </w:p>
 
Parsed in 0.004 seconds, using GeSHi 1.0.8.4


The second bit works for me (provided I don't exclude fonts with PhysicalFonts.setRegex)

The first bit, is effectively <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:eastAsia="Times New Roman" w:cs="Times New Roman"/>

And then on my PC,

WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "汉" (0x6c49) not available in font "ArialMT".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "字" (0x5b57) not available in font "ArialMT".

I can work around that, with fontMapper.put("Arial", PhysicalFonts.get("Arial Unicode MS"));

I knew to use "Arial" instead of "ArialMT", from

Code: Select all
DEBUG org.docx4j.fonts.PhysicalFonts .addPhysicalFont line 334 - Processing physical font: file:/c:/windows/fonts/arial.ttf
DEBUG org.docx4j.fonts.PhysicalFonts .addPhysicalFont line 409 - Added 'Arial' -> file:/C:/Windows/FONTS/arial.ttf
DEBUG org.docx4j.fonts.PhysicalFonts .addPhysicalFont line 431 - added to filename map: arial.ttf
DEBUG org.docx4j.fonts.PhysicalFonts .addPhysicalFont line 448 - -------
ArialMT


The other formatting issues are interesting edge cases (in this rather unusual docx!). I may look at those separately if I get a chance.