Page 1 of 1

Can I read bookmark from word document

PostPosted: Mon Jun 15, 2009 3:44 am
by Vipin
I am looking to read bookmark from the word documents. It there a way I can use this API to read those.

Re: Can I read bookmark from word document

PostPosted: Mon Jun 15, 2009 5:06 am
by jason
Sure, they are represented by org.docx4j.wml.CTBookmark; you'll find them in Body.getEGBlockLevelElts and P (ie paragraph) amongst other places (eg in tables).

Re: Can I read bookmark from word document

PostPosted: Thu Nov 12, 2009 3:29 am
by jason
Here is some text I wrote a while ago on this, and thought I may as well post in this thread for posterity.

For me, they end up as org.docx4j.wml.CTBookmark and org.docx4j.wml.CTMarkupRange respectively.

I made a simple document, which contains:
Code: Select all
<w:bookmarkStart w:id="0" w:name="mybookmark"/><w:r><w:t>stuff</w:t></w:r><w:bookmarkEnd w:id="0"/>


They became:

Code: Select all
javax.xml.bind.JAXBElement
      {http://schemas.openxmlformats.org/wordprocessingml/2006/main}bookmarkStart <----- (JAXBElement)o).getName()
      org.docx4j.wml.CTBookmark <----- (JAXBElement)o).getDeclaredType().getName()

  javax.xml.bind.JAXBElement
      {http://schemas.openxmlformats.org/wordprocessingml/2006/main}bookmarkEnd <----- (JAXBElement)o).getName()
      org.docx4j.wml.CTMarkupRange <----- (JAXBElement)o).getDeclaredType().getName()

You should be able to do something like:

Code: Select all
                if ( ((JAXBElement)o).getName().getLocalPart().equals("bookmarkStart") ) {
                    org.docx4j.wml.CTBookmark bs = (org.docx4j.wml.CTBookmark)((JAXBElement)o).getValue();
                    System.out.println(" .. bookmarkStart" );
                }
               
                if ( ((JAXBElement)o).getName().getLocalPart().equals("bookmarkEnd") ) {
                    org.docx4j.wml.CTMarkupRange be = (org.docx4j.wml.CTMarkupRange)((JAXBElement)o).getValue();
                    System.out.println(" .. bookmarkEnd" );
                }


The relevant part of the XSD is:

Code: Select all
    <xsd:group name="EG_RangeMarkupElements">
        <xsd:choice>
            <xsd:element name="bookmarkStart" type="CT_Bookmark">
                <xsd:annotation>
                    <xsd:documentation>Bookmark Start</xsd:documentation>
                </xsd:annotation>
            </xsd:element>
            <xsd:element name="bookmarkEnd" type="CT_MarkupRange">
                <xsd:annotation>
                    <xsd:documentation>Bookmark End</xsd:documentation>
                </xsd:annotation>
            </xsd:element>

which is why they end up as org.docx4j.wml.CTBookmark and org.docx4j.wml.CTMarkupRange respectively.

Re: Can I read bookmark from word document

PostPosted: Wed Mar 17, 2010 3:56 am
by ostkurve
I determined the CTBookmark and CTMarkupRange objects in my document but how do I receive the text that is bookmarked (I think "stuff" in the example)?

Re: Can I read bookmark from word document

PostPosted: Wed Mar 17, 2010 8:31 am
by jason
The bookmark start and end tags are point tags, so in XML, there is nothing "inside" as such. The start tag could be in one paragraph, and the end tag in another.

If you can assume the tags are in a single paragraph, you could get the contents by iterating through the paragraph contents.

Another way to do it would be to marshal to string, and use string operations to get the substring between the bookmark tags.

What do you want to do with the stuff between the tags? Text extraction? You could use something like TextUtils.extractText

Please let us know how you end up tackling this, anyone with a better approach, please speak up :-)

Re: Can I read bookmark from word document

PostPosted: Thu Mar 18, 2010 12:06 am
by ostkurve
Thanks for the reply, Jason.
Your guess was right, I want to extract the text that is bookmarked. In my case the bookmark does not spread paragraphs so the text is inside one and the same paragraph. However I don't see how I can navigate to the paragraph object when I encounter CTBookmark or CTMarkupRange ...

Re: Can I read bookmark from word document

PostPosted: Thu Mar 18, 2010 9:59 pm
by jason
Given a p which you expect bookmark start and end tags, a quick sketch of one approach, untested, adapted from OpenMainDocumentAndTraverse sample:

Code: Select all
   StringBuffer sb = new StringBuffer();
   boolean inBookmark = false;
   
   void walkList(List children){
            
      for (Object o : children ) {               
         if ( o instanceof javax.xml.bind.JAXBElement) {
            
            System.out.println( "\n" + XmlUtils.JAXBElementDebug((JAXBElement)o) );

            if ( ((JAXBElement)o).getName().getLocalPart().equals("bookmarkStart") ) {
               //org.docx4j.wml.CTBookmark bs = (org.docx4j.wml.CTBookmark)((JAXBElement)o).getValue();
               System.out.println(" .. bookmarkStart" );
               inBookmark = true;
            }
            
            if ( ((JAXBElement)o).getDeclaredType().getName().equals("org.docx4j.wml.Text") ) {
               
               org.docx4j.wml.Text t = (org.docx4j.wml.Text)((JAXBElement)o).getValue();
               
               if (inBookmark) {
                  sb.append( t.getValue() );
               }               
            }             
            
            
            if ( ((JAXBElement)o).getName().getLocalPart().equals("bookmarkEnd") ) {
               //org.docx4j.wml.CTMarkupRange be = (org.docx4j.wml.CTMarkupRange)((JAXBElement)o).getValue();
               System.out.println(" .. bookmarkEnd" );
               inBookmark = false;
            }
            
         } else {
            System.out.println("  " + o.getClass().getName() );
            if ( o instanceof org.docx4j.wml.R) {
               org.docx4j.wml.R  run = (org.docx4j.wml.R)o;
               walkList(run.getRunContent());                           
            }
         }
//         else if ( o instanceof org.docx4j.jaxb.document.Text) {
//            org.docx4j.jaxb.document.Text  t = (org.docx4j.jaxb.document.Text)o;
//            System.out.println("      " +  t.getValue() );               
//         }
      }
   }


Pass in p.getParagraphContent()

ostkurve wrote:how I can navigate to the paragraph object when I encounter CTBookmark or CTMarkupRange


The wml package classes each have a method getParent(), so you should be able to get from CTBookmark to P.

Is that what you meant?

Re: Can I read bookmark from word document

PostPosted: Wed May 16, 2012 7:45 am
by novato
Hello Jason

I'm trying to find Bookmark and P. I found the CTBookmark but I am not able to get P with getParent ()
The wml package classes each have a method getParent(), so you should be able to get from CTBookmark to P.

The error is:

java.lang.ClassCastException: javax.xml.bind.JAXBElement cannot be cast to org.docx4j.wml.P

my code:

Code: Select all
public static org.docx4j.wml.P findBookmarkedP(String name, MainDocumentPart documentPart) throws JAXBException {
   
    final String xpath = "//w:bookmarkStart";

    List<Object> objects = documentPart.getJAXBNodesViaXPath(xpath, false);

    CTBookmark ctb = (CTBookmark) XmlUtils.unwrap(objects.get(1));
    String hh = ctb .getName();
    P p = (P) ctb .getParent();

    return p;
}

Can you help me?
thank you

hh.docx
(18.61 KiB) Downloaded 278 times

Re: Can I read bookmark from word document

PostPosted: Wed May 16, 2012 11:41 pm
by novato
Can anyone help me? I'm lost ..............

Thanks

Re: Can I read bookmark from word document

PostPosted: Thu May 17, 2012 12:51 am
by Nanocom
Code: Select all
Try to figure out what is the class of the parent (getClass())

Re: Can I read bookmark from word document

PostPosted: Thu May 17, 2012 1:56 am
by novato
Thank you very much for the answer.

Sorry, my English is bad, my JAVA is bad and my docx4j is bad ..........................

You can write an example.

Thank

Re: Can I read bookmark from word document

PostPosted: Fri May 18, 2012 9:25 pm
by jason
You may be experiencing the bug discussed at docx-java-f6/get-r-for-a-text-node-t403.html
and docx-java-f6/parent-of-ctcustomxmlblock-t616.html#p1872

You might need to use TraversalUtil instead. Here is some code:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        static class RangeTraverser extends CallbackImpl {
                       
                List<Object> starts = new ArrayList<Object>();
                List<Object> ends   = new ArrayList<Object>();
                List<Object> refs   = new ArrayList<Object>();
               
                String startElement;
                String endElement;
                String refElement;
               
                RangeTraverser(String startElement, String endElement, String refElement) {
                       
                        this.startElement = "org.docx4j.wml." + startElement;
                        this.endElement   = "org.docx4j.wml." + endElement;
                        this.refElement   = "org.docx4j.wml." + refElement;                    
                }

                @Override
                        public List<Object> apply(Object o) {
                               
                                if (o.getClass().getName().equals(startElement))
                                        starts.add(o);

                                if (o.getClass().getName().equals(endElement))
                                        ends.add(o);

                                if (o.getClass().getName().equals(refElement))
                                        refs.add(o);
                               
                                return null;
                        }
                }

 
Parsed in 0.016 seconds, using GeSHi 1.0.8.4


Invoke it with:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
                RangeTraverser rt = new RangeTraverser("CTBookmark", "CTMarkupRange", null);
                new TraversalUtil(paragraphs, rt);
 
Parsed in 0.013 seconds, using GeSHi 1.0.8.4

Re: Can I read bookmark from word document

PostPosted: Mon May 21, 2012 5:14 pm
by novato
Thank you very much for the reply.
It helped me a lot