Page 1 of 1

Russian characters in PDF output

PostPosted: Thu Aug 02, 2012 12:57 am
by jason
risto45 wrote:What I'm doing is: I load docx file, replace some placeholders, generate some tables and convert the result into pdf.

Everything work fine on my development machine (running on Windows 7).
But now when I installed the same thing onto our test server (running on Linux CentOS), everything seems to work, pdf is generated and placeholders are replaced,
but all the russian characters are replaced with questionmarks. The doc contains both latin and russian letters.
All the latin letters are fine, but russian letters are missing.

Can you help?


You'll need to set up a FontMapper to map whatever font is used for Russian in the document, to an appropriate font available on the CentOS box.

Hopefully that'll do the trick.

You can run the PDF sample without an input docx, to generate a list of available fonts.

Russian characters in PDF output

PostPosted: Fri Aug 03, 2012 12:55 am
by risto45
The problem seems to be realted to linux and installed fonts.
Seems like docx4j doesn't find a correct font (including cyrillic letters).

But I've tried several thing and none of them seems to work.
I installed Arial Unicode MS, which contains cyrillic letters.
My original docx uses Arial.
So I did:
Code: Select all
Mapper fontMapper = new IdentityPlusMapper();
template.setFontMapper(fontMapper);

PhysicalFont font = PhysicalFonts.getPhysicalFonts().get("Arial Unicode MS");
fontMapper.getFontMappings().put("Arial", font);

But that didn't help.

If I do:
Code: Select all
Map<String, PhysicalFont> fonts = PhysicalFonts.getPhysicalFonts();

the map is always empty. Seems like docx4j doesn't find any installed fonts.

But when I do:
Code: Select all
GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
String[] fontFamilies = ge.getAvailableFontFamilyNames();

That finds all installed font families (including Arial Unicode MS).

I must admit, that I'm not very good at linux administration nor font management on linux.
Do you have any other suggestions, what can I try.
Can I somehow tell docx4j from where to look font files. I know that iText searches for font files from classpath.

BR,
Risto

Re: tmp

PostPosted: Fri Aug 03, 2012 2:55 am
by jason
To see what is happening, turn on logging in org.docx4j.fonts.

Try ConvertOutPDF with null inputfilepath, to see a document output with different fonts.

If it still doesn't work, please upload a sample Russian docx, and I'll take a look.

Re: Russian characters in PDF output

PostPosted: Sat Aug 04, 2012 12:06 am
by risto45
Thanks for the ConvertOutPDF hint.
Now the russian letters are just fine.

My problem was that when I did:
Code: Select all
PhysicalFont font = PhysicalFonts.getPhysicalFonts().get("Arial Unicode MS");
fontMapper.getFontMappings().put("Arial Unicode MS", font);

I used wrong key. The correct key was "arialuni" not "Arial Unicode MS".

And when I tried to list all the physical fonts (PhysicalFonts.getPhysicalFonts()), I didn't call PhysicalFonts.discoverPhysicalFonts() before.
That's why it didn't find any physical fonts and therefore I didn't know the correct key.

So now all the russian characters appear on my output, but there is still one more issue.
All the bold characters are now regular.
And I know that, this happens because Arial Unicode MS doesn't have bold glyphs.
But I downloaded new font "Arial Cyr", which comes in four files: Arial Cyr, Arial Cyr Bold, Arial Cyr Bold Italic and Arial Cyr Italic
I saw from ConvertOutPDF example that I should be able to detect weather font has bold or italic by following code:
Code: Select all
PhysicalFont pfVariation = PhysicalFonts.getBoldForm(pf);

I dont't find bold form for Arial Cyr

How can I conf the docx4j so that it also finds the bold and italic forms?

Thanks again. You have been a great help.

Rgs,
Risto