Page 1 of 1

get a list of Bookmarks for text extraction

PostPosted: Thu May 12, 2011 4:28 am
by looker
Hi!

I need to extract text from various parts of a document.
So, I've added bookmarks to the document which will be given to a user. The user supposed to add text to the place where bookmarks are.

Does anyone know how to get a list of bookmarks in a document so that to be able to extract text behind them?

Re: get a list of Bookmarks for text extraction

PostPosted: Thu May 12, 2011 9:56 am
by jason
Bookmarks, like field codes, are "point" tags.

You'll find life easier if you use content controls instead. If you data-bind them, you also get "for free" an XML document containing the text you'd otherwise have to extract. For further info, see www.opendope.org

If you still want/have to use bookmarks, you'll have to traverse the main document part (and headers/footers, etc, depending on your app). See TraversalUtil.

Re: get a list of Bookmarks for text extraction

PostPosted: Fri May 13, 2011 3:19 am
by looker
Thank you jason!

Your advise on using content controls and data-bindings seems much more robust then bookmarks.

I have one more question:

with this I can change content in a control:
((CustomXmlDataStorageImpl)customXmlDataStorage).setNodeValueAtXPath("/data/mars_year[1]", "2012","");

with this I can read content from a control:
((CustomXmlDataStorageImpl)customXmlDataStorage).xpathGetString("/data/func_occured[1]", "");

but all these bits of code play with plain strings.

What if I have a Rich text control with ability to have underline/bold parts of text in it. And I want to take the data from this control and put into other Rich text control. If I always extract as a string and add data as a String I wont be able to preserve the format of the data.

Could you please point me to the place to find a solution?

p.s.
For those who are interested on how to start implementing a task similar to mine you can have a look at this topic: viewtopic.php?f=16&t=630 + use CustomXmlBinging.java as a mockup example.

Re: get a list of Bookmarks for text extraction

PostPosted: Fri May 13, 2011 9:56 am
by jason
Per the OpenXML spec, binding of rich text (w:richText) content controls is not supported.

That said, you can still format a "plain text" (w:text) content control; all of its contents has to be the same format though.

For example:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
      <w:sdt>
        <w:sdtPr>
          <w:rPr>
            <w:u w:val="single"/>
          </w:rPr>
          <w:text/>
        </w:sdtPr>
        <w:sdtContent>
          <w:r>
            <w:rPr>
              <w:u w:val="single"/>
            </w:rPr>
            <w:t>brown</w:t>
          </w:r>
        </w:sdtContent>
      </w:sdt>
 
Parsed in 0.001 seconds, using GeSHi 1.0.8.4


And it can be multi-line.

bind.xslt and it supporting extension functions in org.docx4j.model.datastorage.BindingHandler support this. Perhaps you can adapt xpathGenerateRuns.

Re: get a list of Bookmarks for text extraction

PostPosted: Fri May 13, 2011 10:12 pm
by looker
Jason,

What do you think if the following approach works?

1) Set up bookmark in the beginining right before a person expected to enter his rich text
2) Set up bookmark in the end right after the persons text
3) While parsing the document.xml just extract the part between bookmarks

and insert it later whenever is needed. The possible problems are that there might be some unclosed tags left or something like that...