I was accomplishing this in POI, but the ability to read Font type and size was not effective, and the research I did said docx4j was better at that.
My issues so far:
1. I can't get the text from a line(paragraph) consistently; the text is broken into multiple org.docx4j.wml.P/R/Text objects even though the 'line' is unbroken in the text
2. I am not getting the Font type and Size consistently; it is often null and so doesn't allow an easy way to read the font information
3. Being an intermediate Java programmer, I am not really seeing a simple path example in my internet research similar to the functions of POI for reading the text; though POI is limited at reading the font as well (especially for docx), it is simpler in structure.
I have mangled a couple of versions of traverse/dump code and I am having no issues reading the file as they are designed, but I need to output the data to a tab-delimited or Excel file. I am trying to read what are called Copy Documents for websites and email content specification. I'd like to output that to an Excel/delimited text file in 2 columns, basically:
Designator/linkname/alias
Copy Text.
The files are put together by Program Managers, so they have differing formats, where the first part is sometimes contained within [] brackets, sometimes Bold text at a certain size, and even italics (which is why I need to read the font type & size) - is there an example out there that has this easily implemented?
-OR- an easier question might be - am I picking the most difficult path by using the Traverse method: (wordMLPackage) over the JAXB/OpenXML Parts method? I have noticed they are differing methods/code.
Thanks for any help you can offer! Sorry for the newbie approach.
Michael (cveridis)