Page 1 of 1

Generating .docx and .xlsx from XML

PostPosted: Tue Sep 25, 2012 3:05 pm
by bsqfmv
We have an application whereby we are generating html, pdf and csv documents. We have data in XML and we are using XSLT to transform to the appropriate format and write to an outputstream.

Now we are looking to generate .docx and .xslx formats using docx4j. What is the recommended approach for achieving this?

Re: Generating .docx and .xlsx from XML

PostPosted: Tue Sep 25, 2012 5:48 pm
by jason
What does your XML look like in general terms? Does it contain paragraph content (like, say, DocBook), or is it just short data elements that will be used to "fill in the blanks" in the docx/xlsx?

If it contains paragraph content, you could continue to use XSLT (or your could do something like XHTML importer).

If it is data, for docx, the recommended approach is content control data binding (particularly as you already have your data in XML format). See our Getting Started documentation, opendope.org, and the sub forum.

For xlsx, you could transform to Flat OPC XML, or, Google "Excel XML mapping"

Re: Generating .docx and .xlsx from XML

PostPosted: Wed Sep 26, 2012 2:18 pm
by bsqfmv
Thanks Jason.

The XML contains reporting data consisting of tables, images etc (therefore paragraph content).

Is there a code sample that I could take a look at if I wanted to continue to use XSLT?

In the meantime, I'll also take a look at XHTML importer.

Thanks again

Re: Generating .docx and .xlsx from XML

PostPosted: Wed Sep 26, 2012 3:36 pm
by jason
The XHTML importer approach is possibly good if you have pre-existing CSS (or can create it). You can try Flying Saucer on its own (ie without docx4j) to see how it works for you.

Tables and images don't preclude you from using content control data binding; that can handle base64 encoded images, and ought to handle your tables if they are a simple row structure.

To see what you need to do to use XSLT, create a Word document which looks like what you want, then save as XML. This will give you Flat OPC XML, which your XSLT can look to create. You'll have to do a bit of work to get the images right, since the relationship ID has to be handled.

Another way of looking at the choice between XSLT and content control data binding, is that content control data binding puts the design of the document in the hands of business users (so you can potentially inject your data into a variety of documents with different content). Whereas XSLT is generally going to give you a single hard-coded result.