Page 1 of 1

Math expression issue..

PostPosted: Sun Nov 01, 2009 9:11 am
by dqkit
hi, all.
docx4j could do with m:oMath and it is very good. But I want to know if
docx4j could get the math expressions created by MathType ?
In doc, it looks like:
QQ截图未命名.png
QQ截图未命名.png (1.13 KiB) Viewed 6598 times
.
and here is xml content:
Code: Select all
    <w:p w:rsidR="00AE3E38" w:rsidRDefault="00AE3E38">
      <w:r w:rsidRPr="00AE3E38">
        <w:rPr>
          <w:position w:val="-32"/>
        </w:rPr>
        <w:object w:dxaOrig="2480" w:dyaOrig="639">
          <v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
            <v:stroke joinstyle="miter"/>
            <v:formulas>
              <v:f eqn="if lineDrawn pixelLineWidth 0"/>
              <v:f eqn="sum @0 1 0"/>
              <v:f eqn="sum 0 0 @1"/>
              <v:f eqn="prod @2 1 2"/>
              <v:f eqn="prod @3 21600 pixelWidth"/>
              <v:f eqn="prod @3 21600 pixelHeight"/>
              <v:f eqn="sum @0 0 1"/>
              <v:f eqn="prod @6 1 2"/>
              <v:f eqn="prod @7 21600 pixelWidth"/>
              <v:f eqn="sum @8 21600 0"/>
              <v:f eqn="prod @7 21600 pixelHeight"/>
              <v:f eqn="sum @10 21600 0"/>
            </v:formulas>
            <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
            <o:lock v:ext="edit" aspectratio="t"/>
          </v:shapetype>
          <v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:123.75pt;height:32.25pt" o:ole="">
            <v:imagedata r:id="rId6" o:title=""/>
          </v:shape>
          <o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1025" DrawAspect="Content" ObjectID="_1318580336" r:id="rId7"/>
        </w:object>
      </w:r>
    </w:p>



---Kit.Liao

Re: Math expression issue..

PostPosted: Sun Nov 01, 2009 11:54 am
by jason
That is an OLE:

Code: Select all
<o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1025" DrawAspect="Content" ObjectID="_1318580336" r:id="rId7"/>


so I suggest you look at http://dev.plutext.org/forums/viewtopic.php?f=6&t=72

In short, there's a bit of work to do, but it should be feasible (at least on Windows).

Re: Math expression issue..

PostPosted: Mon Nov 02, 2009 5:09 am
by dqkit
Thanks to reply, jason.
What I thought is,
firstly, to get the ole part;
and secondly, to get the binary stream to POI;
and finally to use POI to get the content.
and I tried to do what I thought, I had no luck.
Code: Select all
if (null != this.wmlpackage) {
         OleObjectBinaryPart olePart  = (OleObjectBinaryPart)this.wmlpackage.getParts().get(new PartName("/word/embeddings/oleObject1.bin"));
         System.out.println(olePart.getFs());
      }

It gave me null;

jason, would you like to tell me which is the right way to do this?

Cheers,
Kit.

Re: Math expression issue..

PostPosted: Mon Nov 02, 2009 6:16 am
by jason
You'll have to be prepared to play/learn yourself here, because OleObjectBinaryPart will only take you so far ... (please share your learnings here) .. but looking at the source code, you need to call its initPOIFSFileSystem(), before calling getFS().

Re: Math expression issue..

PostPosted: Mon Nov 02, 2009 7:15 am
by dqkit
thank you, jason.
I will share the information here.
Hope could help somebody who wants to do the same thing like me. 8-)

Re: Math expression issue..

PostPosted: Tue Nov 03, 2009 5:49 am
by dqkit
I used POIFSFileSystem and it gave me much info.
you know, OLE, COM, ActiveX have similar structure.
here's some info about the OLE block data:
POIFSDocument:
poiDocument.getSize()=480
poiDocument.getShortDescription()=Document: "Equation Native" size = 480
00000000 1C 00 00 00 02 00 7E C1 C4 01 00 00 0C 00 E0 00 ......~.........
00000010 10 86 52 00 00 00 00 00 7C 00 E0 00 05 01 00 06 ..R.....|.......
00000020 00 44 53 4D 54 36 00 00 13 57 69 6E 41 6C 6C 43 .DSMT6...WinAllC
00000030 6F 64 65 50 61 67 65 73 00 11 05 CB CE CC E5 00 odePages........
00000040 11 03 53 79 6D 62 6F 6C 00 13 57 69 6E 41 6C 6C ..Symbol..WinAll
00000050 42 61 73 69 63 43 6F 64 65 50 61 67 65 73 00 11 BasicCodePages..
00000060 06 43 6F 75 72 69 65 72 20 4E 65 77 00 11 06 54 .Courier New...T
00000070 69 6D 65 73 20 4E 65 77 20 52 6F 6D 61 6E 00 11 imes New Roman..
00000080 04 4D 54 20 45 78 74 72 61 00 13 57 69 6E 43 68 .MT Extra..WinCh
00000090 69 6E 61 00 11 07 CB CE CC E5 2D B7 BD D5 FD B3 ina.......-.....
000000A0 AC B4 F3 D7 D6 B7 FB BC AF 00 12 00 08 22 5F 45 ............."_E
000000B0 8F 44 2F 41 50 F4 10 0F 47 5F 41 50 F2 1F 1E 41 .D/AP...G_AP...A
000000C0 50 F4 15 0F 41 00 F4 45 F4 25 F4 8F 42 5F 41 00 P...A..E.%..B_A.
000000D0 F4 10 0F 43 5F 41 00 F4 8F 45 F4 2A 5F 48 F4 8F ...C_A...E.*_H..
000000E0 41 00 F4 10 0F 40 F4 8F 41 7F 48 F4 10 0F 41 2A A....@..A.H...A*
000000F0 5F 44 5F 45 F4 5F 45 F4 5F 41 0F 0C 01 00 01 00 _D_E._E._A......
00000100 01 02 02 02 02 00 02 00 01 01 01 00 03 00 04 00 ................
00000110 05 00 06 00 0A 01 00 10 04 E8 03 00 00 00 00 52 ...............R
00000120 65 64 00 0F 01 02 00 83 66 00 02 00 82 28 00 02 ed......f....(..
00000130 00 83 6E 00 02 00 82 29 00 02 04 86 3D 00 3D 03 ..n....)....=.=.
00000140 00 17 10 00 01 00 02 02 82 6C 00 02 00 82 69 00 .........l....i.
00000150 02 00 82 6D 00 00 0B 01 00 02 00 83 78 00 02 04 ...m........x...
00000160 86 AE 00 D2 02 00 82 A5 00 00 01 01 00 0A 03 00 ................
00000170 0A 00 00 01 00 02 00 83 62 00 03 00 1C 00 00 0B ........b.......
00000180 01 01 01 00 02 00 88 32 00 00 00 0A 02 00 82 2D .......2.......-
00000190 00 02 00 88 34 00 02 00 83 61 00 02 00 83 63 00 ....4....a....c.
000001A0 00 0B 01 01 00 0A 03 00 0F 51 00 10 04 E8 03 E8 .........Q......
000001B0 03 00 00 59 65 6C 6C 6F 77 00 0F 02 01 00 02 00 ...Yellow.......
000001C0 83 73 00 02 00 83 64 00 00 0B 0F 01 01 00 02 00 .s....d.........
000001D0 83 73 00 00 01 01 0D 02 04 86 2B 22 F2 00 00 00 .s........+"....

Re: Math expression issue..

PostPosted: Tue Nov 03, 2009 12:19 pm
by jason
Hi .. What is your objective here? If it is to extract an equation and render it however possible - as an image say - then, going back to your original XML, does
<v:imagedata r:id="rId6" o:title=""/>

contain an image of the equation by any chance?
cheers .. Jason

Re: Math expression issue..

PostPosted: Tue Nov 03, 2009 2:18 pm
by dqkit
well, what exactly I want to get is what font and font size the equation is using.
so, I don't just want an image. ;)
by the way, can docx4j get the element <v:shapetype in the document tree? or I should go to the OLE part?

Cheers,
--Kit.

Re: Math expression issue..

PostPosted: Tue Nov 03, 2009 10:58 pm
by jason
dqkit wrote:well, what exactly I want to get is what font and font size the equation is using.


I guess you'll have to understand MathType's file format a bit to work that out

dqkit wrote:by the way, can docx4j get the element <v:shapetype in the document tree?


Should be able to. See org.docx4j.vml.CTShapetype

Re: Math expression issue..

PostPosted: Wed Nov 04, 2009 8:08 am
by dqkit
Mind that, It does owns an element <v:imagedata r:id="rId6" o:title=""/>, but the attribute of src is never specified.
but, in /word/media dir, there is an image file named image1.wmf.
so I guess that after the operation in MathType editor, it would change the Equation to an image?
how does MathType know about the image file when the user have to edit the equation again?

Re: Math expression issue..

PostPosted: Wed Nov 04, 2009 9:35 am
by jason
dqkit wrote:Mind that, It does owns an element <v:imagedata r:id="rId6" o:title=""/>, but the attribute of src is never specified.


The way it usually works is that the image is just for display in the Word document; its what the user might see on the document surface (perhaps if they don't have MathType installed).

If you look at rId6 in document.xml's rels, it should point to your image1.wmf?

dqkit wrote:how does MathType know about the image file when the user have to edit the equation again?


It wouldn't. The OLE part (eg "/word/embeddings/oleObject1.bin") would contain the actual data, which MathType could open/edit/update.

(Saving that to a file ought to be no issue; that's been done for embedded PDF's etc).