Page 1 of 1

FInd & Replace Text

PostPosted: Wed Oct 28, 2009 9:36 pm
by jsimas
Hello.

I'm trying to execute a find & replace action to a docx document. I already tried POI 3.5, but with no results.

I was now looking in docx4j but I’m a bit lost on how to change a text (for example, inside a docx4j element like org.docx4j.wml.P).

Can anyone give me any pointers on how should I find a specific string inside a docx and replace it for something else.

Thanks in advance!

Best regards

JSimas

Re: FInd & Replace Text

PostPosted: Wed Oct 28, 2009 11:57 pm
by jbeltran
Typically a P has a list of objects typically run objects (i.e. R) accessible using p.getParagraphContent(). Each R object has a list of objects typically text objects called Text accessible using p.getRunContent(). The Text objects have a setValue and thats where you can edit the text for that specific objects.

So to find & replace, you have to traverse the elements and look for the text you care about. If you don't want to traverse the elements, you could always marshall the elements you care about to a String and use a regex to replace the text you care about and then unmarshall the object back into tha docx4j element.

Justin

Re: FInd & Replace Text

PostPosted: Thu Oct 29, 2009 1:58 am
by jason
One thing which makes this a bit tricky is that a word can actually be fragmented across runs of text. So for example, "hello" might be <w:r><w:t>hell</w:t></w:r><w:r><w:t>hello</w:t></w:r>. Such fragmentation will be due to things like change tracking, formatting, and spell check.

org/docx4j/TextUtils.java contains a method

Code: Select all
public static void extractText(Object o, Writer w, JAXBContext jc)


which gives you a plain text representation of an object.

If you do your find against that, you'll at least know whether the string is present or not. You'd still need to replace in the real JAXB objects though (or be prepared to replace the paragraph contents with the plain text (and lose all formatting).

A find/replace method would be a nice addition to the code base, so feel free to post whatever you come up with as a contribution, and we can help to polish it.

In this respect, please note the DocumentModel class. Right now, it doesn't deal with the document at this level of detail, but you could imagine a class associated with each org.docx4j.wml.P (paragraph) class, containing the plain text representation of the paragraph (and for other purposes, the vertical space the paragraph will occupy on the page).

Re: FInd & Replace Text

PostPosted: Thu Oct 29, 2009 12:39 pm
by jsimas
Thanks jbeltran and jason for your quick reply.

In my case, the search field will be something like [[xpto:fieldtoreplacebyobjectvalue]] kind of string. I'm not expecting to be fragmented by different <w:r> fragments (i hope!!!).

Still, when i have the time (i'm a bit pressured right now, because i already waste a lot of time with POI) i hope i can look into the docx4j code (i already got it) and find the best solution possible.

Thanks once again.

Best regards

Jsimas

Re: FInd & Replace Text

PostPosted: Thu Oct 29, 2009 1:22 pm
by jason
Ok, have a look at lines 654-676 of http://dev.plutext.org/trac/docx4j/brow ... Utils.java

cheers .. Jason