Page 1 of 1

How to get information about layout of paragraph

PostPosted: Sun Apr 15, 2012 9:58 pm
by georgezeng
I am working on manipulating a .docx file and build it into a jasperreport template file.
I need to set the same layout in the jasper file, but a first barrier comes when a paragraph lies more than one text line in word,
I have no idea how to get the information of this, does any one knows about it? I'll be appreciated!

Re: How to get information about layout of paragraph

PostPosted: Mon Apr 16, 2012 10:02 am
by jason
docx4j does not have a layout model.

A layout model would be a useful thing to add for applications such as yours, and also for calculating page numbers (eg refreshing TOC).

Layout of a paragraph would be straightforward enough. A complete model (columns, images, tables, footnotes etc) would be a significant undertaking.

To layout a paragraph, you need to know how much horizontal space you have (page width - margins - indentation), and font information (size, kerning etc).

(Ideally, we'd replicate Word's precise algorithm, but a starting point (Google 'line break algorithm') is http://en.wikipedia.org/wiki/Word_wrap and http://stackoverflow.com/questions/1758 ... -algorithm )

To find how much vertical space it will take, you need line spacing, plus space before/after.

docx4j can tell you all that stuff. If you'd like to contribute a basic model, I'd be happy to write the basic code which gets this information for you. You'd then have to fill in the details (ie the calculations based on this information).

There are quite a few projects around which you can look at to see how layout is done, for example pango (see http://fishsoup.net/bib/PangoIuc25-paper.pdf ), Flying saucer xhtmlrenderer (which lays out XHTML, and is used by docx4j for XHTML import), FOP (which lays out XSL FO - see http://wiki.apache.org/xmlgraphics-fop/KnuthsModel ), AWT, but I don't know of a reference book which explains the principles. To start off, have a read of http://behdad.org/text/

I don't mean to scare you off ... sounds like just a simple line break / word-wrap algorithm is what you are after, rather than a full blown page layout model. In any case, a simple line break / word-wrap algorithm is a good place to start.