unable to read content of TOC in a docx file
Posted:
Tue Aug 03, 2021 6:36 pm
by softkumsh
Hi,
am trying to read the TOC content of my docx file. my use case is based on toc item need to return the TOC content.
I am trying to get with tocfinder , but no luck.
Document wmlDocumentEl = documentPart.getJaxbElement();
Body body = wmlDocumentEl.getBody();
TocFinder finder = new TocFinder();
new TraversalUtil(body.getContent(), finder);
SdtBlock currentSDT = finder.getTocSDT();
Its retruning only the
[Contents, TOC \o "1-3" \h \z \u , , , , , , , , , , , , , , , , , , , , , , , ]
not sure how to read content based on the TOC.
Please help on this.
Re: unable to read content of TOC in a docx file
Posted:
Thu Aug 12, 2021 8:06 pm
by jason
Looking at the docx you posted at
https://github.com/plutext/docx4j/issues/470 I can see it contains the below xml, which is typical TOC content.
What precisely do you want to do with it? You could for example process each paragraph in the w:sdtContent
But it sounds like instead you want to process the docx content, for example the text at w:bookmarkStart/@w:name="_Toc52376231". If so, you could ignore the TOC and instead iterate through the docx looking for bookmarkStart. There may be some code in
https://github.com/plutext/docx4j/tree/ ... docx4j/toc wich helps you.
Using xml Syntax Highlighting
<w:sdt>
<w:sdtPr>
<w:rPr>
<w:rFonts w:ascii="Arial" w:eastAsia="Times New Roman" w:hAnsi="Arial" w:cs="Tahoma"/>
<w:b w:val="0"/>
<w:bCs w:val="0"/>
<w:color w:val="auto"/>
<w:sz w:val="22"/>
<w:szCs w:val="24"/>
</w:rPr>
<w:id w:val="-1352567643"/>
<w:docPartObj>
<w:docPartGallery w:val="Table of Contents"/>
<w:docPartUnique/>
</w:docPartObj>
</w:sdtPr>
<w:sdtEndPr>
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi" w:eastAsiaTheme="minorEastAsia" w:hAnsiTheme="minorHAnsi" w:cstheme="minorBidi"/>
<w:szCs w:val="22"/>
</w:rPr>
</w:sdtEndPr>
<w:sdtContent>
<w:p w14:paraId="23484590" w14:textId="1D9D9A5C" w:rsidR="0032730B" w:rsidRPr="00C36B9B" w:rsidRDefault="0032730B" w:rsidP="00EB7D6B">
<w:pPr>
<w:pStyle w:val="TOCHeading"/>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="0"/>
</w:numPr>
</w:pPr>
<w:r w:rsidRPr="00C36B9B">
<w:t>Contents
</w:t>
</w:r>
</w:p>
<w:p w14:paraId="15E20DE3" w14:textId="020D64F3" w:rsidR="00C449B3" w:rsidRDefault="0032730B">
<w:pPr>
<w:pStyle w:val="TOC1"/>
<w:tabs>
<w:tab w:val="left" w:pos="440"/>
<w:tab w:val="right" w:leader="dot" w:pos="9019"/>
</w:tabs>
<w:rPr>
<w:noProof/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="00C36B9B">
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r w:rsidRPr="00C36B9B">
<w:instrText xml:space="preserve"> TOC \o "1-3" \h \z \u
</w:instrText>
</w:r>
<w:r w:rsidRPr="00C36B9B">
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:hyperlink w:anchor="_Toc52376228" w:history="1">
<w:r w:rsidR="00C449B3" w:rsidRPr="002E6AAE">
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
<w:noProof/>
</w:rPr>
<w:t>1
</w:t>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
</w:rPr>
<w:tab/>
</w:r>
<w:r w:rsidR="00C449B3" w:rsidRPr="002E6AAE">
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
<w:noProof/>
</w:rPr>
<w:t>Document Purpose
</w:t>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:tab/>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:instrText xml:space="preserve"> PAGEREF _Toc52376228 \h
</w:instrText>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:t>5
</w:t>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:hyperlink>
</w:p>
<w:p w14:paraId="1C3B687A" w14:textId="3F0C1144" w:rsidR="00C449B3" w:rsidRDefault="00331CF3">
<w:pPr>
<w:pStyle w:val="TOC2"/>
<w:tabs>
<w:tab w:val="right" w:leader="dot" w:pos="9019"/>
</w:tabs>
<w:rPr>
<w:noProof/>
</w:rPr>
</w:pPr>
<w:hyperlink w:anchor="_Toc52376229" w:history="1">
<w:r w:rsidR="00C449B3" w:rsidRPr="002E6AAE">
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
<w:noProof/>
</w:rPr>
<w:t>Governance
</w:t>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:tab/>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:instrText xml:space="preserve"> PAGEREF _Toc52376229 \h
</w:instrText>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:t>6
</w:t>
</w:r>
<w:r w:rsidR="00C449B3">
<w:rPr>
<w:noProof/>
<w:webHidden/>
</w:rPr>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:hyperlink>
</w:p>
:
Parsed in 0.014 seconds, using
GeSHi 1.0.8.4
Re: unable to read content of TOC in a docx file
Posted:
Wed Aug 18, 2021 2:08 am
by softkumsh
My use case is wanted to read the toc values based on the toc we passed.
I am able to read all content using sdtcontent as a paragraph, but there everything comes as a list of paragraphs (including toc).
It would be great if I get the all TOC as list.