Page 1 of 1

hindi,kannda,tamil,telugu

PostPosted: Tue Feb 18, 2014 9:30 pm
by dazzfayaz
Hi Jason,

I am converting docx (which is having symbols and multiple languages like hindi,kannda,tamil,telugu ..etc) to pdf.
But DOCX is generating properly with all symbols and languages. But pdf is generating with #### characters instead of different language characters and
i am losing the formatting ( bold,italics,underline) in pdf but docx is generating correctly.

Below is my conversion code.

Mapper fontMapper = new IdentityPlusMapper();
wordMLPackage.setFontMapper(fontMapper);
PdfConversion c = new Conversion(wordMLPackage);
String pdfFile ="C:/123.pdf";
wordMLPackage.save(new File(pdfFile));
OutputStream os = new java.io.FileOutputStream(pdfFile);
PdfSettings pdfSettings = new PdfSettings();
pdfSettings.setWmlPackage(wordMLPackage);
c.output(os, pdfSettings);
os.close();

Please help me to get different languages and formatting( bold and italic) in pdf after converting it from DOCX.
Attached is the result pdf after conversion.

Thanks.

Re: hindi,kannda,tamil,telugu

PostPosted: Wed Feb 19, 2014 10:43 am
by jason
Your docx (extract attached) uses Cambria Math in bold, italic and bold italic forms.

FOP says:

Code: Select all
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Font "Cambria Math,normal,700" not found. Substituting with "Cambria Math,normal,400".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Font "Cambria Math,italic,400" not found. Substituting with "Cambria Math,normal,400".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Font "Cambria Math,italic,700" not found. Substituting with "Cambria Math,normal,400".



I've addressed this with

https://github.com/plutext/docx4j/commi ... cb86896d2c

Re: hindi,kannda,tamil,telugu

PostPosted: Wed Feb 19, 2014 12:03 pm
by jason
I've attached a docx consisting of just the tricky characters of your original docx.

Following https://github.com/plutext/docx4j/commi ... 95166ba516 these characters are output to FO, but the font is incorrect.

Could you please go through the document, and tell me, for each run:
1. what language/script it is
2. what font is used on your system for it

For convenience, here is the w:t for the 4 runs:

Code: Select all
First:
        <w:t>भारत ने स्वतंत्रता प्राप्ति के बाद के 63 वर्षों में बहुआयामी सामाजिक-आर्थिक प्रगति की है। भारत का क्षेत्रफल 32</w:t>

Second:
        <w:t> वर्ग कि.मी. है, जो हिमालय की हिमाच्छादित चोटियों से लेकर दक्षिण के उष्णकटिबंधीय सघन वनों तक फैला हुआ है। विश्व के</w:t>

Third:
        <w:> सातवें विशालतम देश को पर्वत तथा समुद्र शेष एशिया से अलग करते हैं, जिससे इसका अलग भौगोलिक अस्तित्व है। इसके उत्तर में  हिमालय पर्वत शृंखला है, जहाँ से यह </w:t>

Fourth:
        <w:t> లబగదహల్ిబపలూలసలలమనవసమని ంమనలమన లమనలమనల ్ిలుల లలపలూసీ,తీతనగీసైరరీస</w:t>

Re: hindi,kannda,tamil,telugu

PostPosted: Fri Feb 21, 2014 11:15 pm
by dazzfayaz
i am facing problem in pdf. can u please send me a pdf which is converted from docx4j having different languages and tables inside tables.
thanks

Re: hindi,kannda,tamil,telugu

PostPosted: Sat Feb 22, 2014 7:17 am
by jason
I can't help you unless you answer my question.

jason wrote:Could you please go through the document, and tell me, for each run:
1. what language/script it is
2. what font is used on your system for it


dazzfayaz wrote:tables inside tables


I already responded to that in your other thread.

Re: hindi,kannda,tamil,telugu

PostPosted: Tue Feb 25, 2014 9:20 pm
by dazzfayaz
Hi Jason,

The complex script fonts used are : Mangal,Cambria Math,Gautami
The languages used are : Hindi (India),Arabic (saudi Arabia),Telugu(India),English (United States).
attached are the screenshots of reveal formatting.

Thanks

Re: hindi,kannda,tamil,telugu

PostPosted: Tue Feb 25, 2014 9:36 pm
by dazzfayaz
The locale i am using in my machine is English (United States).
i attached screenshots of the fonts installed in my machine.