Page 1 of 1

Unicode characters in xml binding

PostPosted: Fri Nov 14, 2014 7:05 am
by jbrichau
Hi,

I have an issue binding xml parts with unicode characters beyond the latin1 extended set.
I'm using the OpenDope Authoring plugin in Word to create a template and a docx4java client program, similar to the one in [1] to perform the actual merge.

The unicode char U+2318 is contained in an xml element to be merged. When I use it inside the OpenDope plugin in Word, it gets inserted correctly in the Word document. When I run the merge with doc4java, the character is not recognized when I open the resulting word file.

The XML file is correctly utf8 encoded.

Is this a know issues? What can cause this?
thanks for any clues.

[1] https://github.com/plutext/docx4j/blob/ ... geXML.java

Re: Unicode characters in xml binding

PostPosted: Fri Nov 14, 2014 7:35 pm
by jason
I merged invoice-data.xml into invoice.docx, from https://github.com/plutext/docx4j/tree/ ... atabinding

My modified invoice-data.xml contained:
Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
<name>app &#2318; les</name>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


The resulting Word docx contained the character ऎ

I think that's ok, given that SC uniPad reports it as http://www.fileformat.info/info/unicode ... /index.htm

(However, it doesn't look like the character shown on that web page)

Re: Unicode characters in xml binding

PostPosted: Wed Nov 19, 2014 10:20 am
by jbrichau
Hi Jason,

Thanks for your reply.

If I try that in my document, I see the character as '?' in the document. The weird thing is that when I copy/paste that character to another tool or even within the same document!! ... it is a ⌘ (what it should be). So, it seems as if the correct character is in there, just not shown correctly... until I do a copy paste, then the paste is shown correcty.

Johan

Re: Unicode characters in xml binding

PostPosted: Wed Nov 19, 2014 11:05 am
by jason
Seems like you just need to have the correct w:rPr/w:rFonts setting for the content control, so a suitable font is used