Page 1 of 1

reading the value of an activeX component with docx4j

PostPosted: Wed Jan 12, 2011 12:53 am
by Lars
Hi all,

I've got a document with different activeX elements. Is there a possiblity to read the value of these elements with docx4j? e.g. the text of an activeX textfield? Loading and manipulating of the documents works good, but i've got no idea how to read these elements.

kind regards,
Lars

Re: reading the value of an activeX component with docx4j

PostPosted: Wed Jan 12, 2011 7:54 pm
by jason
I created a Word docxm containing a text box and a label.

I patched docx4j http://dev.plutext.org/trac/docx4j/changeset/1388 so that the ActiveX parts are recognised as OleObjectBinaryPart. The PartsList sample shows:

Code: Select all
        Part /word/activeX/activeX2.xml [org.docx4j.openpackaging.parts.ActiveXControlXmlPart] http://schemas.openxmlformats.org/officeDocument/2006/relationships/control
            Part /word/activeX/activeX2.bin [org.docx4j.openpackaging.parts.WordprocessingML.OleObjectBinaryPart] http://schemas.microsoft.com/office/2006/relationships/activeXControlBinary

        Part /word/activeX/activeX1.xml [org.docx4j.openpackaging.parts.ActiveXControlXmlPart] http://schemas.openxmlformats.org/officeDocument/2006/relationships/control
            Part /word/activeX/activeX1.bin [org.docx4j.openpackaging.parts.WordprocessingML.OleObjectBinaryPart] http://schemas.microsoft.com/office/2006/relationships/activeXControlBinary


These can then be examined via methods in OleObjectBinaryPart; For example, where p is the .bin part:

Code: Select all
            ((OleObjectBinaryPart)p).initPOIFSFileSystem();
            ((OleObjectBinaryPart)p).viewFile(true);


For my example, this results in:

Code: Select all
##contents
##CompObj
Root Entry
  CompObj <(0x01)CompObj> [112 / 0x70]
  contents [64 / 0x40]
POIFS FileSystem
  Property: "Root Entry"
    Name          = "Root Entry"
    Property Type = 5
    Node Color    = 1
    Time 1        = 0
    Time 2        = 0
  CompObj
    Property: "CompObj"
      Name          = "CompObj"
      Property Type = 2
      Node Color    = 1
      Time 1        = 0
      Time 2        = 0
    Document: "CompObj" size = 112
      00000000 01 00 FE FF 03 0A 00 00 FF FF FF FF 23 9E 8C 97 ............#...
      00000010 B0 D4 CE 11 BF 2D 00 AA 00 3F 40 D0 1A 00 00 00 .....-...?@.....
      00000020 4D 69 63 72 6F 73 6F 66 74 20 46 6F 72 6D 73 20 Microsoft Forms
      00000030 32 2E 30 20 4C 61 62 65 6C 00 10 00 00 00 45 6D 2.0 Label.....Em
      00000040 62 65 64 64 65 64 20 4F 62 6A 65 63 74 00 0E 00 bedded Object...
      00000050 00 00 46 6F 72 6D 73 2E 4C 61 62 65 6C 2E 31 00 ..Forms.Label.1.
      00000060 F4 39 B2 71 00 00 00 00 00 00 00 00 00 00 00 00 .9.q............
  contents
    Property: "contents"
      Name          = "contents"
      Property Type = 2
      Node Color    = 1
      Time 1        = 0
      Time 2        = 0
    Document: "contents" size = 64
      00000000 00 02 20 00 2B 00 00 00 00 00 00 00 FF FF FF 00 .. .+...........
      00000010 07 00 00 80 4D 79 4C 61 62 65 6C 00 EC 09 00 00 ....MyLabel.....
      00000020 7B 02 00 00 00 02 18 00 35 00 00 00 07 00 00 80 {.......5.......
      00000030 D8 00 00 00 00 02 00 00 43 61 6C 69 62 72 69 00 ........Calibri.

##contents
##CompObj
Root Entry
  CompObj <(0x01)CompObj> [116 / 0x74]
  contents [68 / 0x44]
POIFS FileSystem
  Property: "Root Entry"
    Name          = "Root Entry"
    Property Type = 5
    Node Color    = 1
    Time 1        = 0
    Time 2        = 0
  CompObj
    Property: "CompObj"
      Name          = "CompObj"
      Property Type = 2
      Node Color    = 1
      Time 1        = 0
      Time 2        = 0
    Document: "CompObj" size = 116
      00000000 01 00 FE FF 03 0A 00 00 FF FF FF FF 10 1D D2 8B ................
      00000010 42 EC CE 11 9E 0D 00 AA 00 60 02 F3 1C 00 00 00 B........`......
      00000020 4D 69 63 72 6F 73 6F 66 74 20 46 6F 72 6D 73 20 Microsoft Forms
      00000030 32 2E 30 20 54 65 78 74 42 6F 78 00 10 00 00 00 2.0 TextBox.....
      00000040 45 6D 62 65 64 64 65 64 20 4F 62 6A 65 63 74 00 Embedded Object.
      00000050 10 00 00 00 46 6F 72 6D 73 2E 54 65 78 74 42 6F ....Forms.TextBo
      00000060 78 2E 31 00 F4 39 B2 71 00 00 00 00 00 00 00 00 x.1..9.q........
      00000070 00 00 00 00                                     ....
  contents
    Property: "contents"
      Name          = "contents"
      Property Type = 2
      Node Color    = 1
      Time 1        = 0
      Time 2        = 0
    Document: "contents" size = 68
      00000000 00 02 24 00 01 01 40 80 00 00 00 00 1B 48 80 2C ..$...@......H.,
      00000010 09 00 00 80 EC 09 00 00 7B 02 00 00 4D 79 54 65 ........{...MyTe
      00000020 78 74 42 6F 78 00 00 00 00 02 18 00 35 00 00 00 xtBox.......5...
      00000030 07 00 00 80 D8 00 00 00 00 02 00 00 43 61 6C 69 ............Cali
      00000040 62 72 69 00                                     bri.


You can see the text i entered "MyLabel" in the label control, and "MyText" in the textbox control.

Please note the POIFS methods are from Apache POI. To dig deeper, you'll need to understand the OLE format. See http://poi.apache.org/poifs/fileformat.html
See also the Microsoft specs, including for example [MS-CFB] Compound File Binary File Format, and [MS-OLEDS] OLE Data Structures.

Please let us know how you go.

hth .. Jason