Page 1 of 1

chinese charactors became '#' while converting to pdf

PostPosted: Sat Feb 15, 2014 4:13 am
by aohajin
I've been using docx4j for a while and it's really amazing! but recently I encounter a problem that I cant fix.. :shock:

I use docx4j 3.0.0 via sbt, and I try to get some sample pdf output using such code
Code: Select all
FOSettings foSettings = Docx4J.createFOSettings();
      foSettings.setWmlPackage(wordMLPackage);
      
      Mapper fontMapper = new BestMatchingMapper();
      fontMapper.getFontMappings().put("Simsun", PhysicalFonts.getPhysicalFonts().get("SimSun"));
      wordMLPackage.setFontMapper(fontMapper);
      
      /*
      for ( String name:PhysicalFonts.getPhysicalFonts().keySet() ){
         Logger.info("found font:" + name);
      }*/      
      
      OutputStream os = new java.io.FileOutputStream(settings.Constant.DEBUG_PATH + "/test.pdf");
      Docx4J.toFO(foSettings, os, Docx4J.FLAG_NONE);


but when I run this simple program I got a pdf with '#' everywhere... which should all be some chinese charactors... :shock:
well I run it on my mac and it does have the needed font 'Simsun'. the program complains no error or exception while excuting.

I read some old posts in this forum, and I used reveal formatting in word on those charactors and I did get 'font:SImsun' >"<

I tried to use the demo webapp on this site but unfortunately I still got ############# :shock:

here is my docx and the result pdf, I would be really appreciate if someone can help me with it... >"<

Re: chinese charactors became '#' while converting to pdf

PostPosted: Tue Feb 18, 2014 8:50 pm
by jason
Fixed. Please try http://www.docx4java.org/docx4j/docx4j- ... 140218.jar

One of the issues was that your paragraphs have:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
       <w:rFonts w:ascii="SimSun" w:hAnsi="SimSun" w:cs="SimSun"/>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


Your docDefaults are:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
<w:styles >
  <w:docDefaults>
    <w:rPrDefault>
      <w:rPr>
        <w:rFonts w:asciiTheme="minorHAnsi" w:eastAsiaTheme="minorEastAsia" w:hAnsiTheme="minorHAnsi" w:cstheme="minorBidi"/>
 
Parsed in 0.001 seconds, using GeSHi 1.0.8.4


The problem was that docx4j 3.0.1 is combining these to produce:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
    <w:rFonts w:ascii="SimSun" w:hAnsi="SimSun" w:cs="SimSun" w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:eastAsiaTheme="minorEastAsia" w:cstheme="minorBidi"/>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


and then from that, using font Calibri, not SimSung.

http://webapp.docx4java.org/OnlineDemo/ ... Fonts.html says the theme attribute takes priority over the corresponding non-theme attribute.

For the fixes, please see commits of today including https://github.com/plutext/docx4j/commi ... 3749845ec5

Re: chinese charactors became '#' while converting to pdf

PostPosted: Tue Feb 18, 2014 9:34 pm
by jason
I forgot to mention, there's a character in your docx which uses font "Libian SC Regular".

I don't have that font, so I added:

fontMapper.getFontMappings().put("Libian SC Regular", PhysicalFonts.getPhysicalFonts().get("SimSun"));

(I was using IdentityPlusMapper).

Re: chinese charactors became '#' while converting to pdf

PostPosted: Fri Feb 21, 2014 6:44 am
by aohajin
jason wrote:I forgot to mention, there's a character in your docx which uses font "Libian SC Regular".

I don't have that font, so I added:

fontMapper.getFontMappings().put("Libian SC Regular", PhysicalFonts.getPhysicalFonts().get("SimSun"));

(I was using IdentityPlusMapper).



thanks a lot ! :D