Page 1 of 1

Table of contents

PostPosted: Mon Aug 03, 2009 11:14 am
by jason
Notes following a quick look at table of contents.

It depends whether you want a complete generated TOC, or just the TOC field code (which means using Word to generate the TOC from the field code: either by pressing Ctrl-F9, or using a macro to automate that, or printing).

The TOC field code is simple enough; its just:

Code: Select all
                            <w:p>
                                <w:r>
                                    <w:fldChar w:fldCharType="begin"/>
                                </w:r>
                                <w:r>
                                    <w:instrText xml:space="preserve"> TOC \o "1-3" \h \z \u </w:instrText>
                                </w:r>
                                <w:r>
                                    <w:fldChar w:fldCharType="end"/>
                                </w:r>
                            </w:p>

If you have that (and a Styles part), Word will generate a TOC for you (I think .. I think it will create the bookmarks as well) when you tell it to update the field codes. Without a styles part, things styled H1 to H3 aren't detected as such.

But if you need Word to generate the TOC from the field code, you may as well use Word to insert the field code as well (ie don't use docx4j to do any of the TOC work). That is, unless the user will print the document, which will update the TOC automatically.

To generate a complete TOC using docx4j, you need to create the actual TOC entries. Boiled down to their bare essence, these look like:

Code: Select all
                            <w:p>
                                <w:hyperlink w:anchor="_Toc236597049" w:history="1">
                                    <w:r>
                                        <w:t>TOC entry text</w:t>
                                    </w:r>
                                    <w:r>
                                        <w:tab/>
                                    </w:r>
                                    <w:r>
                                        <w:fldChar w:fldCharType="begin"/>
                                    </w:r>
                                    <w:r>
                                        <w:instrText xml:space="preserve"> PAGEREF _Toc236597049 \h </w:instrText>
                                    </w:r>
                                    <w:r>
                                        <w:fldChar w:fldCharType="separate"/>
                                    </w:r>
                                    <w:r>
                                        <w:t>page #</w:t>
                                    </w:r>
                                    <w:r>
                                        <w:fldChar w:fldCharType="end"/>
                                    </w:r>
                                </w:hyperlink>
                            </w:p>

You also need to include in the document, corresponding bookmarks around each heading:

Code: Select all
                    <w:p>
                        <w:pPr>
                            <w:pStyle w:val="Heading1"/>
                        </w:pPr>
                        <w:bookmarkStart w:id="1" w:name="_Toc236597049"/>
                        <w:r>
                            <w:t>H2</w:t>
                        </w:r>
                        <w:bookmarkEnd w:id="1"/>
                    </w:p>



Page numbers ... Word will update these before printing (if Word is so configured).

If you intend to print direct from docx4j (ie via PDF), you should consider generating the TOC only at the PDF stage (unless you need to save the intermediate Word document).

Word 2007 includes its TOC bookmarks in an SDT, but I expect the SDT can be safely omitted.

Re: Table of contents

PostPosted: Mon Sep 14, 2009 8:39 pm
by eford
Thanks, again. As usual, your tips were very helpful.

Word 2007 includes its TOC bookmarks in an SDT, but I expect the SDT can be safely omitted.


Actually, the SDT proved to be essential for me. My table of contents, and sequencing of the actual data, is defined by a sitemap which the user can dynamically edit. As a result, I have to iterate through that map building the individual "pages" of the docx file at the same time I accumulate the TOC entries. The SDT provided the container which I could insert into the document at its proper location while continuing to add entries to it as I process the individual page data. I am still faced with the user having to manually update the TOC field to get it to reflect the correct page numbering but that's a Word issue, as has been discussed in other articles on this forum.

Re: Table of contents

PostPosted: Wed Sep 29, 2010 12:09 am
by jason
Testing this, the minimal XML required for Word to generate a TOC (including hyperlinks and associated bookmarks), is:

Code: Select all
          <w:p>
            <w:r>
              <w:fldChar w:fldCharType="begin" w:dirty="true"/>
            </w:r>
            <w:r>
              <w:instrText xml:space="preserve"> TOC \o "1-3" \h \z \u </w:instrText>
            </w:r>
            <w:r>
              <w:fldChar w:fldCharType="end"/>
            </w:r>
          </w:p> 


Note the w:dirty="true". The actual field code in instrText could be altered to meet your requirements.

Re: Table of contents

PostPosted: Fri Jun 17, 2011 9:38 am
by infiniteConfused
So, I used the code that allows Word to generate the TOC ("TOC \\o \"1-3\" \\h \\z \\u"), and it works wonderfully expect everytime I open the document I get the following dialogue box:

"This document contains fields that may refer to other files. Do you want to update the fields in this document?"

Is there a way to get Word to automatically update the links without the popup box? I read somewhere that if you set the updateFields = true (see below) that it should work. However I can't find an option for updateFields and don't know if that will actually fix the problem.

Code: Select all
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:settings xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:updateFields w:val="true" />
</w:settings>


Does anyone have any suggestions short of writting out each referece manually in java?

Thanks

Re: Table of contents

PostPosted: Tue Jul 12, 2011 3:43 am
by mike82
I have the same problem. Adding "updateFields" tag to settings.xml don't fix the problem.
Is there any other solution ?

Re: Table of contents

PostPosted: Thu Jul 14, 2011 1:20 am
by jason
Please see http://www.samuraiprogrammer.com/blog/2 ... pened.aspx

The author says:

Both of these methods will work just fine in Word 2010.

In Word 2007, though, you need to clear out the contents of the field before the user opens the document.


I haven't confirmed this; let us know what you find.

An alternative may be an AutoOpen macro:- http://support.microsoft.com/kb/212703

See especially http://openxmldeveloper.org/blog/b/open ... macro.aspx

Note Eric's advice:

you don't want to set the <w:updateFields val='true'> element in the settings part. If this element is there, then even though there is a macro that will update the TOC, Word will still put up a modal dialog box indicating, "This document contains fields that may refer to other files. Do you want to update the fields in this document." In addition, you do not want to set the w:dirty attribute to true on the <w:fldChar w:fldCharType='begin'/> element.


Finally, see http://support.microsoft.com/kb/211629

That suggests that if you set the docx to open in print layout view, the fields might be updated automatically!

You can set that in settings.xml, with setView(view), where:
Code: Select all
      CTView view = Context.getWmlObjectFactory().createCTView();      
      view.setVal(STView.PRINT);
      


Let us know which of these works.

Re: Table of contents

PostPosted: Thu Jun 21, 2012 9:22 pm
by himanshuk
Hi Jason,

I am sorry for replying on this thread as it is quite old but I was using the information given here as reference.
I am trying to create TOC in docx using the sample code. I got the same error "This document contains files that may refer to other files. Do you want to update the fields in the document?" for the first time when I open the document in Docx.
I used all the tricks given in this thread but I am still getting the error.
Code: Select all
         ObjectFactory factory = Context.getWmlObjectFactory();
                        P p = factory.createP();
         R r = factory.createR();
                        FldChar fldChar = factory.createFldChar();
         fldChar.setFldCharType(STFldCharType.BEGIN );
         fldChar.setDirty(true);
         r.getContent().add(getWrappedFldChar(fldChar) );
         p.getContent().add(r);
         R r1 = factory.createR();
         Text text = new Text();
         text.setSpace("preserve");
         text.setValue("TOC \\o \"1-3\" \\h \\z \\u");
         r1.getContent().add(factory.createRInstrText(text));
         p.getContent().add(r1);
         FldChar fldcharend =factory.createFldChar();
         fldcharend.setFldCharType(STFldCharType.END);
         R r2 = factory.createR();
         r2.getContent().add(getWrappedFldChar(fldcharend));
         p.getContent().add(r2);
         WordprocessingMLPackage mlPackage = WordprocessingMLPackage.createPackage();
         MainDocumentPart mainDocumentPart = mlPackage.getMainDocumentPart();
         //Adding Print View and Setting Update Field to true
         CTSettings ct  = new CTSettings();
         DocumentSettingsPart dsp =   mainDocumentPart.getDocumentSettingsPart();
         if( dsp == null) {
            dsp = new DocumentSettingsPart();
            CTView ctView = Context.getWmlObjectFactory().createCTView();
            ctView.setVal(STView.PRINT);
            ct.setView(ctView);
            BooleanDefaultTrue b = new BooleanDefaultTrue();
            b.setVal(true);
            ct.setUpdateFields(b);
            dsp.setJaxbElement(ct);
            mainDocumentPart.addTargetPart(dsp);
         }
         Document wmlDocument = (Document) mainDocumentPart.getJaxbElement();
         Body body = wmlDocument.getBody();
         body.getContent().add(p);
              mainDocumentPart.addStyledParagraphOfText("Heading1", "Hello World");
         mainDocumentPart.addStyledParagraphOfText("Heading2", "Docx4J Testing");
         mainDocumentPart.addStyledParagraphOfText("Heading1", "Hello World 1");
         mlPackage.save(new File("D:\\Hello.docx"));

The code given above is creating the settings.xml and populating updateFields and view attribute successfully.
Please let me know what can be done to remove this error.

Thanks,
Himanshu

Re: Table of contents

PostPosted: Thu Jul 05, 2012 5:08 pm
by himanshuk
Hi Jason,

I am able to create TOC without the page numbers. I have done so by creating the hyperlinks around each line of TOC. Each of this hyperlink is pointing to their corresponding to Heading1 and Heading2 sections. I have created the similar structure as created by the Word.
I am still facing one problem though. As part of TOC following structure needs to be created in the document.xml
Code: Select all
<w:sdt>
   <w:sdtPr>
      <w:id w:val="1446547526"/>
      <w:docPartObj>
         <w:docPartGallery w:val="Table of Contents"/>
      </w:docPartObj>
   </w:sdtPr>
   <w:sdtEndPr>
      <w:rPr>
         <w:rFonts w:cstheme="minorBidi" w:eastAsiaTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:asciiTheme="minorHAnsi"/>
         <w:b w:val="false"/>
         <w:bCs w:val="false"/>
         <w:color w:val="auto"/>
         <w:sz w:val="24"/>
         <w:szCs w:val="24"/>
      </w:rPr>
   </w:sdtEndPr>
   <w:sdtContent></w:sdtContent></w:sdt>

In this structure there is <w:docPartObj> element. I am creating this using org.docx4j.wml.CTSdtDocPart class. While saving this file an exception occurs which is given below
[com.sun.istack.internal.SAXException2: unable to marshal type "org.docx4j.wml.CTSdtDocPart" as an element because it is missing an @XmlRootElement annotation]
org.docx4j.openpackaging.exceptions.Docx4JException: Failed to add parts from relationships
at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:378)
at org.docx4j.openpackaging.io.SaveToZipFile.save(SaveToZipFile.java:164)
at org.docx4j.openpackaging.io.SaveToZipFile.save(SaveToZipFile.java:105)
at org.docx4j.openpackaging.packages.WordprocessingMLPackage.save(WordprocessingMLPackage.java:219)
at com.main.CreateWordDoc.main(CreateWordDoc.java:220)
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Problem saving part word/document.xml
at org.docx4j.openpackaging.io.SaveToZipFile.saveRawXmlPart(SaveToZipFile.java:303)
at org.docx4j.openpackaging.io.SaveToZipFile.saveRawXmlPart(SaveToZipFile.java:194)
at org.docx4j.openpackaging.io.SaveToZipFile.savePart(SaveToZipFile.java:410)
at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:373)
... 4 more
Caused by: javax.xml.bind.MarshalException
- with linked exception:
[com.sun.istack.internal.SAXException2: unable to marshal type "org.docx4j.wml.CTSdtDocPart" as an element because it is missing an @XmlRootElement annotation]
at com.sun.xml.internal.bind.v2.runtime.MarshallerImpl.write(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.MarshallerImpl.marshal(Unknown Source)
at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(Unknown Source)
at org.docx4j.openpackaging.parts.JaxbXmlPart.marshal(JaxbXmlPart.java:197)
at org.docx4j.openpackaging.parts.JaxbXmlPart.marshal(JaxbXmlPart.java:175)
at org.docx4j.openpackaging.io.SaveToZipFile.saveRawXmlPart(SaveToZipFile.java:245)
... 7 more
Caused by: com.sun.istack.internal.SAXException2: unable to marshal type "org.docx4j.wml.CTSdtDocPart" as an element because it is missing an @XmlRootElement annotation
at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.reportError(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeRoot(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.property.ArrayReferenceNodeProperty.serializeListBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.property.ArrayERProperty.serializeBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.childAsXsiType(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.property.SingleElementNodeProperty.serializeBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.childAsSoleContent(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeRoot(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.property.ArrayReferenceNodeProperty.serializeListBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.property.ArrayERProperty.serializeBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.childAsXsiType(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.property.SingleElementNodeProperty.serializeBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.childAsSoleContent(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeRoot(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.childAsRoot(Unknown Source)
... 13 more
org.docx4j.openpackaging.exceptions.Docx4JException: Failed to add parts from relationships
at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:378)
at org.docx4j.openpackaging.io.SaveToZipFile.save(SaveToZipFile.java:164)
at org.docx4j.openpackaging.io.SaveToZipFile.save(SaveToZipFile.java:105)
at org.docx4j.openpackaging.packages.WordprocessingMLPackage.save(WordprocessingMLPackage.java:219)
at com.main.CreateWordDoc.main(CreateWordDoc.java:220)
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Problem saving part word/document.xml
at org.docx4j.openpackaging.io.SaveToZipFile.saveRawXmlPart(SaveToZipFile.java:303)
at org.docx4j.openpackaging.io.SaveToZipFile.saveRawXmlPart(SaveToZipFile.java:194)
at org.docx4j.openpackaging.io.SaveToZipFile.savePart(SaveToZipFile.java:410)
at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:373)
... 4 more

I solved the problem by using the DOCX4J code. I used
Code: Select all
@XmlRootElement(name="docPartObj")
on top of CTSdtDocPart class to solve it but with latest nightly build it is failing.

Code which I am using to create the Header is given below
Code: Select all
sdtPRTocBlock.setId();
      CTSdtDocPart docPartPRToc = factory.createCTSdtDocPart();
      DocPartGallery docGal = docPartPRToc.getDocPartGallery();
      if( docGal == null ) {
         docGal = factory.createCTSdtDocPartDocPartGallery();
      }
      docGal.setVal("Table of Contents");
      docPartPRToc.setDocPartGallery(docGal);
      sdtPRTocBlock.getRPrOrAliasOrLock().add(docPartPRToc);
      
      
      //Create RPr for SDT End Pr
      RPr rPrSdtEndMain = factory.createRPr();
      //Create Fonts for Rpr
      RFonts fonts = factory.createRFonts();
      fonts.setAsciiTheme(STTheme.MINOR_H_ANSI);
      fonts.setEastAsiaTheme(STTheme.MINOR_H_ANSI);
      fonts.setHAnsiTheme(STTheme.MINOR_H_ANSI);
      fonts.setCstheme(STTheme.MINOR_BIDI);
      rPrSdtEndMain.setRFonts(fonts);
      rPrSdtEndMain.setB(falseV);
      rPrSdtEndMain.setBCs(falseV);
      Color auto = factory.createColor();
      auto.setVal("auto");
      rPrSdtEndMain.setColor(auto);
      
      HpsMeasure measure = factory.createHpsMeasure();
      measure.setVal(BigInteger.valueOf(24));
      rPrSdtEndMain.setSz(measure);
      rPrSdtEndMain.setSzCs(measure);
      sdtEndMain.getRPr().add(rPrSdtEndMain);
      
      
      P p1ForConBlo = factory.createP();
      p1ForConBlo.setRsidRDefault("3030454637453234");
      p1ForConBlo.setRsidR("3030454637453234");
      
      //Create PR for P1
      PPr prforP1 = factory.createPPr();
      PStyle pStyleforP1 = factory.createPPrBasePStyle();
      pStyleforP1.setVal("TOCHeading");
      prforP1.setPStyle(pStyleforP1);
      p1ForConBlo.setPPr(prforP1);
      
      R rforP1 = factory.createR();
      Text tforP1 = factory.createText();
      tforP1.setValue("Table of Contents");
      rforP1.getContent().add(tforP1);
      p1ForConBlo.getContent().add(rforP1);
      contentBlock.getContent().add(p1ForConBlo);

Re: Table of contents

PostPosted: Mon Apr 22, 2013 4:36 pm
by Vibhor
Hi Jason,

Opening in document in Print Layout view is not helping. It's not re-paginating the document's page numbers. And I can't set that updatefield or dirtybit field to TRUE because of that warning message. As the application I am working on is integrated with webconsole and it uses the template to generate documents that are downloaded on customers side.
Because of this, I can't use AutoOpen macro as it requires external script to be saved on the system which will be executed each time the document is opened. And in case of Macro enabled document, few customers might have disabled macros execution for security reasons as they are external scripts embedded with document. So, it may not work in those cases.
Is there any other way possible??

Please let me know.
Thanks in advance.

Re: Table of contents

PostPosted: Wed Aug 14, 2013 10:00 pm
by zzzimon
Hi!

I'm also getting the dialog box that says "This document contains fields that may refer to other files. Do you want to update the fields in this document" whenever I open a Word file with a table of contents (toc) that I've generated using docx4j.

I'm using the minimal xml code that you (Jason) posted in this topic:

Code: Select all
          <w:p>
            <w:r>
              <w:fldChar w:fldCharType="begin" w:dirty="true"/>
            </w:r>
            <w:r>
              <w:instrText xml:space="preserve"> TOC \o "1-3" \h \z \u </w:instrText>
            </w:r>
            <w:r>
              <w:fldChar w:fldCharType="end"/>
            </w:r>
          </w:p>


I've tried the things mentioned in this post, for example:

1. Ive tried removing the dirty=true attribute in the xml code above. The dialog box disappears but so does the table of contents.

2. Ive tried adding the <w:updateFields w:val="true" /> to the setting.xml file (while the dirty=true attribute removed) but the dialog box still remains. Ive tried using both the dirty attribute and the updateFields setting but the dialog box remains.

3. Ive watched the 5 video clips by Eric White (http://openxmldeveloper.org/blog/b/open ... macro.aspx). The 3rd, 4th and 5th video all address the issue with the dialog box. The 3rd and 4th video use a solution created in Visual Studio. As I'm using Java and I cannot use these two solutions. The 5th video shows how to use a Word macro that automatically updates the fields. The problem is that creating that macro, it is used on all documents created with my word program. It's possible to set an option that the macro should only be run one a certain document, but then it doesn't let you save the file in docx fomat, but rather in docm format.

4. Ive also tried adding the two lines of code to change to print view, without any luck
Code: Select all
CTView view = Context.getWmlObjectFactory().createCTView();     
view.setVal(STView.PRINT);


Have you got any other ideas. Im using Word 2007 and not 2010.
Thanks for reading

/kind regards Simon

Re: Table of contents

PostPosted: Thu Aug 15, 2013 8:32 pm
by jason
zzzimon wrote:It's possible to set an option that the macro should only be run one a certain document, but then it doesn't let you save the file in docx fomat, but rather in docm format.


Yes, you can inject a macro into your docx (and call it a docm), but doing this, you just change the "Do you want to update the fields in this document" prompt for another one (do you want to run the macro?).

The alternative is to have docx4j do it all: generate the table entries, and the corresponding page numbers. EDIT Oct 2013 Plutext now offers docx4j TOC Helper, a commercial extension for docx4j, which can generate or update a table of contents, including page numbers. Please email sales@plutext.com for more information.

Re: Table of contents

PostPosted: Thu Aug 08, 2019 2:06 pm
by jason
Please note that is an old post; the relevant code has since been open sourced as part of docx4j proper.

See the sample at https://github.com/plutext/docx4j/blob/ ... eDemo.java

This does rely on either docx4j-export-fo or Plutext PDF Converter to provide the page numbers:

+ https://github.com/plutext/docx4j-export-fo
+ https://converter-eval.plutext.com/