Page 1 of 1

Arabic Characters in PDF

PostPosted: Fri Apr 24, 2015 3:00 am
by mmshabeer
testarabic.pdf
OutputFile
(26.3 KiB) Downloaded 293 times
testarabic.docx
InputFile
(13.84 KiB) Downloaded 218 times
Hi,

Using Docx4j(3.2.0), tried to generate PDF from docx using 'Docx4J.toFO'.
Arabic characters are missing in PDF.

Code Snippet:

Code: Select all
private static void createTestPDF() throws Exception{
      FOSettings foSettings = Docx4J.createFOSettings();
      InputStream is = new FileInputStream(new File("testarabic.docx"));
      WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is);

      //Print all available physical fonts
      PhysicalFonts.discoverPhysicalFonts();
      Map<String, PhysicalFont> physicalFonts = PhysicalFonts.getPhysicalFonts();
      Iterator<Entry<String, PhysicalFont>> availableFonts = physicalFonts.entrySet().iterator();
      while(availableFonts.hasNext()) {
         Entry<String, PhysicalFont> font = availableFonts.next();
         String key = font.getKey();
         PhysicalFont pFont = font.getValue();
         System.out.println("Key is " + key + ";; Name " + pFont.getName());
      }
      
      Mapper fontMapper = new IdentityPlusMapper();
            
      PhysicalFont font  = PhysicalFonts.get("Arial Unicode MS");
      fontMapper.put("Times New Roman", font);
      
      wordMLPackage.setFontMapper(fontMapper);
         
      foSettings.setWmlPackage(wordMLPackage);
      
      OutputStream pdfOutputStream = new FileOutputStream("testarabic.pdf");
      System.out.println(foSettings.getSettings());
      Docx4J.toFO(foSettings, pdfOutputStream, Docx4J.FLAG_EXPORT_PREFER_XSL);
      
      System.out.println(" Done !!!!");
}


Attached documents.

Environment : Windows 7
Java Version: 1.6
Kindly help.

Thanks.

Re: Arabic Characters in PDF

PostPosted: Thu Apr 30, 2015 8:31 pm
by jason
Please see attached png, which shows that my docx4j PDF output is the same as the Word docx (as it appears on my system).

Note that I have font "Arabic Typesetting" installed on this machine, and it is used in the PDF output.

Not sure whether you see the асдфас stuff?

This is using current docx4j source code. If there are any differences from 3.2.0, that would be because of changes in https://github.com/plutext/docx4j/blob/ ... ector.java
(click the history button).

I see you already have something like:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
                // .. example of mapping font Times New Roman which doesn't have certain Arabic glyphs
                // eg Glyph "ي" (0x64a, afii57450) not available in font "TimesNewRomanPS-ItalicMT".
                // eg Glyph "ج" (0x62c, afii57420) not available in font "TimesNewRomanPS-ItalicMT".
                // to a font which does
                PhysicalFont font
                                = PhysicalFonts.get("Arial Unicode MS");
                        // make sure this is in your regex (if any)!!!
                if (font!=null) {
                        fontMapper.put("Times New Roman", font);
                        //fontMapper.put("Arial", font);
                }
 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4


Do you have Arial Unicode MS installed?

Try uncommenting fontMapper.put("Arial", font);
since arial is used for the асдфас stuff

Re: Arabic Characters in PDF

PostPosted: Fri Feb 02, 2018 9:46 pm
by Asttle
Can u mention the dependencies for this code because i am also using the same code but i am getting exxceptions