Page 1 of 1

BestMatchingMapper bugs handling explicit substitutions

PostPosted: Mon Dec 12, 2011 8:38 pm
by jeromyevans
The BestMatchingMapper class attempts to match commonly used fonts to their equivalents on systems that don't have Microsoft fonts installed.
Part of the matching is via equivalents specified in FontSubstitutions.xml.

Here's an extract referenced below.
Code: Select all
   <replace name="arial">
      <SubstFonts>
         arial;nimbussansl;freesans;albanyamt;albany;helvetica;lucidasans;lucida;geneva;helmet;sansserif;nimbussans;andalesansui;arialunicodems;lucidaunicode</SubstFonts>
      <SubstFontsMS></SubstFontsMS>
                ...


I've encountered the following three bugs in this process:

(1) FontSubstutitions.xml uses the lowercase whitespace and punctuation removed name of the font. If the document contains "Times New Roman" it is not matched to the equivalent replace element for "timesnewroman". Similarly "Arial" is not matched to "arial".

(2) When matched, the method searching PhysicalFonts for the substitution font also uses the short key, not the proper name used by PhysicalFonts. For example, if matching "arial" to a substitute it tries to find "freesans" in PhysicalFont's map instead of "Free Sans".

(3) On the system tested, the SubsFonts value is inclusive of the leading whitespace (eg. in the line above, the first token is "\n\t\tarial' instead of "arial" (seems odd that whitespace is included after unmarshalling). This means the first substitution always fails to match a font. As, by convention, the first token is usually the name of the font, this effectively means on systems where msttcorefonts are installed, the BestMatchingMapper fails to match the exact font. ie. it can't match "arial" to "arial" because the substitution is named "\n\t\tarial".

Tested with v2.7.1 on Linux 3.0.0-12-generic #20-Ubuntu SMP x86_64 GNU/Linux

I've worked around this but it's not elegant. I created a map in BestMatchingMapper of the PhysicalFonts' map keyed using the similar convention as in FontSubstitutions.xml and modified BestMatchingMapper to use this key when dealing with values in or from FontSubstitutions.xml. The patch for the work-around is attached.

When it works this is the result (on a Linux system with msttfcorefonts package):
Code: Select all
[BestMatchingMapper] Mapped Wingdings -->  OpenSymbol( file:/usr/share/fonts/truetype/openoffice/opens___.ttf
[BestMatchingMapper] Mapped Times New Roman -->  Times New Roman( file:/usr/share/fonts/truetype/msttcorefonts/times.ttf
[BestMatchingMapper] Mapped Courier New -->  Courier New( file:/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf
[BestMatchingMapper] Mapped Symbol -->  Symbol( file:/usr/share/fonts/X11/Type1/Symbol.pfb
[BestMatchingMapper] Mapped Arial -->  Arial( file:/usr/share/fonts/truetype/msttcorefonts/arial.ttf


Regards,
Jeromy Evans

PS. Patches are provided in accordance in accordance with the CLA http://dev.plutext.org/docx4j/docx4j_In ... butor.docx

Re: BestMatchingMapper bugs handling explicit substitutions

PostPosted: Fri Mar 30, 2012 12:18 pm
by jason
Hi Jeremy

Thank you very much for taking the time to create this patch and to document it so thoroughly.

This made applying it simple, which I have (belatedly) done as http://www.docx4java.org/trac/docx4j/changeset/1767

cheers .. Jason