Page 1 of 1

How to read special symbols from docx file (ex. α, γ,etc)

PostPosted: Wed Oct 29, 2014 11:29 pm
by veer
Hi,
I'm unable to read special symbols from a word file (.docx) using docx4j.

Text: vitamin D3, PPARγ, HNF-4α, glucocorticoid.

output: vitamin D 3 , PPAR ? , HNF-4 , glucocorticoid. (not getting the special symbols)

Also i used : XmlUtils.marshaltoString(jaxbNode, true, true)
but didn't get.
i'm unable to figure out what to do.

I have attached source content file.
any solution.....?


Thanks.

Re: How to read special symbols from docx file (ex. α, γ,etc

PostPosted: Thu Oct 30, 2014 11:19 pm
by jason
There are 2 different cases there.

In the first case, your source XML contains:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
            <w:r >
              <w:t>γ</w:t>
            </w:r>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


That's just a normal character in UTF-8 encoding, so you should get it as long as you don't corrupt the encoding.

The second case uses:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
            <w:r>
              <w:sym w:font="Symbol" w:char="F061"/>
            </w:r>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


so you need to process the w:sym element.

See for example https://github.com/plutext/docx4j/blob/ ... riter.java

Note also, that if you want to use the correct font for output, there's a complex algorithm involved. See https://github.com/plutext/docx4j/blob/ ... ector.java

Re: How to read special symbols from docx file (ex. α, γ,etc

PostPosted: Fri Oct 31, 2014 4:17 pm
by veer
Thanks a lot jason for help.
i was having some UTF-8 encoding issue. Finally i got it.

Thanks