Page 1 of 1

Parsing docx to change style of individual words

PostPosted: Thu Sep 23, 2010 10:40 pm
by zievereir
Hi,
First of all thanks for this awesome library.

I'd like to be able to parse a whole docx and change the style of certain individual words. For example in my document I want "Car" to be bold everywhere.

I've been thinking about trying to insert all the words 1 by 1 (instead of paragraph by paragraph) in my new document and letting it check whether the current word is equal to the word I want to be in bold but this method looks quite tedious and inefficient... But it looks like it's the only solution since you are forced to define the style in advance before the text you want to insert.

Is there any other way how I can achieve this?

Thanks.

Re: Parsing docx to change style of individual words

PostPosted: Thu Sep 23, 2010 11:15 pm
by jason
Thanks for your kind words :-)

Are you processing existing Word documents, or creating them from scratch in docx4j?

This is important, because it determines whether you can assume your word is found in a single w:t (text) element, or might be split across 2 of them.

If you can assume your word is not split across w:t elements, and you are processing existing documents, there are three ways to traverse document.xml looking for your word:

- via XPath
- via XSLT
- using TraversalUtil

Via XPath is probably the easiest. An earlier forum post contains the xpath expression you'll be looking for. Note: there is a bug in JAXB which stops you matching on w:t and then getting its parent; you need to match the w:r which contains the relevant w:t

Once you've found the w:t containing your word, you need to split the w:t, creating new w:r parents for each w:t, and stick these in the document in place of the existing w:r/w:t. That code would more or less be the same for each method.

See further http://blogs.msdn.com/ericwhite/archive ... raphs.aspx

You can't readily re-use that code, though since it relies on linq. http://stackoverflow.com/questions/1217 ... t-for-linq

Still, the docx4j code should be relatively straightforward :-)

Re: Parsing docx to change style of individual words

PostPosted: Thu Sep 23, 2010 11:26 pm
by zievereir
Thank you for your very fast reply!
Definately going to take a look at XPath.