Page 1 of 1

Use cases for manipulating MS Word docx files

PostPosted: Thu Jul 09, 2009 12:45 am
by jbeltran
Hi all,

I'm doing initial research for a proof of concept for a project and I'm trying to see what how mature the capabilities are in docx4j in regards to the following:

1. Parsing text in documents (i.e. in paragraphs, tables, etc.)
2. Merging different word documents
3. Creating hyperlinks (not to external URLs, but to other places in document)
4. Creating table of contents

If it is possible, are there any code samples to do the following around? Thanks in advance!

Justin

Re: Use cases for manipulating MS Word docx files

PostPosted: Thu Jul 09, 2009 8:28 am
by jason
Hi Justin

jbeltran wrote:1. Parsing text in documents (i.e. in paragraphs, tables, etc.)


Can you explain what you want to do a bit more? Certainly you can get and manipulate the text ...

jbeltran wrote:2. Merging different word documents


You can certainly add/remove parts, and manipulate their contents.

Update Nov 2010: I've created a paid extension for docx4j which provides a general solution to the problem of merging documents. See http://dev.plutext.org/blog/2010/11/mer ... documents/ for details.

jbeltran wrote:3. Creating hyperlinks (not to external URLs, but to other places in document)


These are bookmarks i think, aren't they? docx4j does support those.

jbeltran wrote:4. Creating table of contents


I haven't done this, but I expect it would be fairly straightforward. Will Word be the ultimate consumer? If it is, maybe you just add the field code (iirc), and let Word generate the TOC. If you need to populate the table yourself, do you need up to date page numbers? If you have merged documents, then you won't be able to rely on lastRenderedPageBreak to help you with the page counting.

cheers

Jason