Page 1 of 1

Handling symbols.

PostPosted: Fri May 11, 2012 10:46 pm
by keerthi
Hi All,
While trying to extract the contents from a (docx) document, i am facing a challenge in converting the symbols inserted through "Insert Symbol" feature. The document.xml file has the following contents for the symbol α (alpha).

<w:r w:rsidR="00A223B5" w:rsidRPr="00675051"><w:rPr><w:rFonts w:cs="Arial"/><w:szCs w:val="22"/></w:rPr><w:fldChar w:fldCharType="begin"/></w:r><w:r w:rsidR="00231886" w:rsidRPr="00675051"><w:rPr><w:rFonts w:cs="Arial"/><w:szCs w:val="22"/></w:rPr><w:instrText>symbol 97 \f "Symbol" \s 11</w:instrText></w:r><w:r w:rsidR="00A223B5" w:rsidRPr="00675051"><w:rPr><w:rFonts w:cs="Arial"/><w:szCs w:val="22"/></w:rPr><w:fldChar w:fldCharType="end"/></w:r>

Please help me to convert this data to a string value. In this case, "α" (alpha).

Thanks in advance.
Regards,
Keerthi. G

Re: Handling symbols.

PostPosted: Fri May 18, 2012 2:57 pm
by jason
As you can see, your symbol is in

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
<w:instrText>symbol 97 \f "Symbol" \s 11</w:instrText>
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


Presumably it is character 97 in the symbol font.

What encoding are you using for your string? Does the question just come down to how to map characters from the Symbol font to say UTF 8?

A quick way to convert may be to save as HTML from with Word.

Let us know what you discover.

Re: Handling symbols.

PostPosted: Tue Aug 06, 2013 5:13 pm
by dazzfayaz
Hi,

I am not able to get few symbols in docx but those symbols are there in the document.xml
help me what to do for supporting symbols.
Attached is the docx which is having the symbols for ⊥ ⊇ but in docx these symbols are displaying as not known symbols.

Thanks

Re: Handling symbols.

PostPosted: Tue Aug 06, 2013 10:50 pm
by jason
copy/pasting from this thread yields:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
                    <w:r>
                        <w:rPr>
                            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math" w:cs="Cambria Math"/>
                        </w:rPr>
                        <w:t>⊥ ⊇</w:t>
                    </w:r>
 
Parsed in 0.001 seconds, using GeSHi 1.0.8.4


Or did you want something else?

Re: Handling symbols.

PostPosted: Tue Aug 06, 2013 10:52 pm
by jason
Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
org.docx4j.wml.ObjectFactory wmlObjectFactory = new org.docx4j.wml.ObjectFactory();

R r = wmlObjectFactory.createR();
    // Create object for rPr
    RPr rpr = wmlObjectFactory.createRPr();
    r.setRPr(rpr);
        // Create object for rFonts
        RFonts rfonts = wmlObjectFactory.createRFonts();
        rpr.setRFonts(rfonts);
            rfonts.setAscii( "Cambria Math");
            rfonts.setCs( "Cambria Math");
            rfonts.setHAnsi( "Cambria Math");
    // Create object for t (wrapped in JAXBElement)
    Text text = wmlObjectFactory.createText();
    JAXBElement<org.docx4j.wml.Text> textWrapped = wmlObjectFactory.createRT(text);
    r.getContent().add( textWrapped);
        text.setValue( "⊥ ⊇");

 
Parsed in 0.016 seconds, using GeSHi 1.0.8.4