Page 1 of 1

how to read word file paragraph by paragraph with its style

PostPosted: Sat Oct 18, 2014 9:33 pm
by sdhramveer
I am trying to read a word file para by para and also need para style with that at the same time. And same case while reading a paragraph , my need is to read characters with their styles.
plz can anyone tell me how to do this using docx4j. ASAP

Ex: para in word file is like:-
1. "MOLECULAR ENDOCRINOLOGY : ENDOCRINE GENETICS" is a paragraph having "chap Title" style.

"MOLECULAR ENDOCRINOLOGY" is having "main title" as character style,
"ENDOCRINE GENETICS" is having "sub title" style as character style.

and 2. "Isolation : Digestion of DNA" is a paragraph having "chap Title" style.

"MOLECULAR ENDOCRINOLOGY" is having "main title" as character style,
"ENDOCRINE GENETICS" is having "sub title" style as character style.


So output should be like:
char style is: "main title", text: "MOLECULAR ENDOCRINOLOGY"
char style is: "sub title", text: "ENDOCRINE GENETICS"
char style is: "main title", text: "Isolation"
char style is: "sub title", text: "Digestion of DNA"

Re: how to read word file paragraph by paragraph with its st

PostPosted: Mon Oct 20, 2014 12:56 pm
by jason
Have you read the "Getting Started" document? Everything you need to know is there.

You could iterate through the main document part's content list, but that would miss paragraphs nested in tables, content controls etc.

So you are best of using a TraversalUtil approach:

https://github.com/plutext/docx4j/blob/ ... verse.java

Once you have a paragraph or a run, look for its pStyle (rStyle) in its pPr (rPr).

Re: how to read word file paragraph by paragraph with its st

PostPosted: Mon Oct 20, 2014 6:15 pm
by sdhramveer
Thanks for this sir..
But i am doing it in diff way according to my XML requirments.
So i did this using:

List<Object> jNodes = docPart.getJAXBNodesViaXPath("//w:p", true);
for (Object jNode : jNodes)
{
System.out.println(XmlUtils.marshaltoString(jNode, true, true));
PPr ppr = ((P)XmlUtils.unwrap(jNode) ).getPPr();
if (ppr != null && ppr.getPStyle() != null)
{
System.out.println("Para Style>>> \""+ppr.getPStyle().getVal()+"\" & Text>>> \"" + jNode.toString()+"\"");
}
}

OutPut:

<w:p w14:textId="5B2EFC17" .......>
<w:pPr>
<w:pStyle w:val="Chapau"/>
<w:rPr>
<w:color w:val="FF0000"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00D14AE8">
<w:rPr>
<w:rStyle w:val="Authors"/>
</w:rPr>
<w:t>Ram K.</w:t>
</w:r>
<w:r>
<w:t xml:space="preserve"> </w:t>
</w:r>
<w:r w:rsidRPr="00D14AE8">
<w:rPr>
<w:rStyle w:val="etal"/>
</w:rPr>
<w:t>Menon</w:t>
</w:r>
<w:bookmarkStart w:name="_GoBack" w:id="0"/>
<w:bookmarkEnd w:id="0"/>
</w:p>

Para Style>>> "Chapau" & Text>>> "Ram K. Menon"

I also need paragraph content with their styles, like:-

Char Style>>> "Authors" & Text>>> "Ram K."
Char Style>>> "etal" & Text>>> "Menon"

I tried this to get this data from "jNode" object but i' getting some exceptions.
b/c i'm not expert in Docx4j. I recently used this and don't have time to go through all documentation.
So plz, let me know, i you can help me with this to save my time.

Re: how to read word file paragraph by paragraph with its st

PostPosted: Tue Oct 21, 2014 7:20 am
by jason
What exception?