Plutext

Posted: **Wed Dec 05, 2012 3:03 am**

Hi Everybody,

I parse word document and transfer them into a wiki. It would be nice recognizing Table of Content of Docx documents thus I just use a wiki plugin to provide that and the content of the table of contents from docx could be skipped.

How can I implement that?

Tanks in advanced.

Zsolt

Posted: **Wed Dec 05, 2012 7:11 am**

What approach are you using to parse/transfer into wiki? (XSLT, or something else?)

Of course, you need to detect the start of the ToC, and the end of it.... you can unzip a docx and have a look at it.

To continue this thread, please paste the XML of a short ToC here.

Posted: **Wed Dec 05, 2012 6:54 pm**

Hi Jason,

I just use a code similar like below just to traverse through the document (source code attached).

Body body = wmlDocumentEl.getBody();

new TraversalUtil(body, this);

Unfortunately I cannot provide the entire document and have also difficulties editing it via vi or gedit but here are some lines:

<w:tab w:val="clear" w:pos="9072"/></w:tabs><w:spacing w:line="260" w:lineRule="exact"/><w:rPr><w:rFonts w:cs="Arial"/></w:rPr><w:sectPr w:rsidR="007A688C" w:rsidRPr="009D3F87" w:rsidSect="00A52790"><w:headerReference w:type="default" r:id="rId9"/><w:footerReference w:type="default" r:id="rId10"/><w:headerReference w:type="first" r:id="rId11"/><w:footerReference w:type="first" r:id="rId12"/><w:pgSz w:w="11906" w:h="16838" w:code="9"/><w:pgMar w:top="652" w:right="624" w:bottom="652" w:left="1418" w:header="652" w:footer="652" w:gutter="0"/><w:cols w:space="720"/><w:titlePg/><w:docGrid w:linePitch="299"/></w:sectPr></w:pPr></w:p><w:p w14:paraId="680332F4" w14:textId="77777777" w:rsidR="007A688C" w:rsidRPr="008F21AA" w:rsidRDefault="007A688C" w:rsidP="007A688C"><w:pPr><w:rPr><w:rFonts w:cs="Arial"/><w:b/><w:bCs/></w:rPr></w:pPr><w:r w:rsidRPr="008F21AA"><w:rPr><w:rFonts w:cs="Arial"/><w:b/><w:bCs/></w:rPr><w:lastRenderedPageBreak/><w:t xml:space="preserve">Inhaltsverzeichnis </w:t></w:r></w:p><w:p w14:paraId="5DC6A76F" w14:textId="77777777" w:rsidR="0075566C" w:rsidRDefault="007A688C"><w:pPr><w:pStyle w:val="Verzeichnis1"/><w:tabs><w:tab w:val="left" w:pos="480"/><w:tab w:val="right" w:leader="dot" w:pos="9855"/></w:tabs><w:rPr><w:rFonts w:asciiTheme="minorHAnsi" w:eastAsiaTheme="minorEastAsia" w:hAnsiTheme="minorHAnsi" w:cstheme="minorBidi"/><w:noProof/><w:sz w:val="22"/><w:szCs w:val="22"/><w:lang w:eastAsia="de-DE"/></w:rPr></w:pPr><w:r w:rsidRPr="008F21AA"><w:rPr><w:rFonts w:cs="Arial"/></w:rPr><w:fldChar w:fldCharType="begin"/></w:r><w:r w:rsidRPr="008F21AA"><w:rPr><w:rFonts w:cs="Arial"/></w:rPr><w:instrText xml:space="preserve"> TOC \o "1-5" \h \z </w:instrText></w:r><w:r w:rsidRPr="008F21AA"><w:rPr><w:rFonts w:cs=

Processing each paragraph I find content as below:

Inhaltsverzeichnis
TOC \o "1-5" \h \z 1Allgemeines PAGEREF _Toc335729390 \h 8
1.1Projektorganisation PAGEREF _Toc335729391 \h 8
1.1.1Projektverantwortliche/ Projektteam MAHLE PAGEREF _Toc335729392 \h 8

and lot of other lines from table of contents.

The parsing itself is correct however I would like to drop those lines and insert my wiki plugin to show that information.

Thanks!

Regards,

Zsolt

Plutext

How to skip table of Contents during parsing

How to skip table of Contents during parsing

Re: How to skip table of Contents during parsing

Re: How to skip table of Contents during parsing