Page 1 of 1

convert content control's content to html

PostPosted: Thu Aug 23, 2012 5:39 pm
by buptstehc
Hi, all! I want to convert content control's content to html. i guess the method is as follows:
1. traverse docx as 'sample\ContentControlsInfoStructure.java', find every content control
2. insert every content control in a new docx respectively
3. convert every docx to html as 'sample\ConvertOutHtml.java'.

this my code snippet:
Code: Select all
public class ParseDocx {
   public static void main(String[] args) throws Exception {
      // TODO Auto-generated method stub
      String input_DOCX = "resource/new.docx";
      WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
            .load(new java.io.File(input_DOCX));
      
      TraversalUtilContentControlVisitor visitor = new TraversalUtilContentControlVisitor();      
      IndentingVisitorCallback contentControlCallback = new IndentingVisitorCallback(visitor);
      
      visitor.callback = contentControlCallback;
      
      contentControlCallback.walkJAXBElements(
            wordMLPackage.getMainDocumentPart().getJaxbElement() );
   }
   
   public static class TraversalUtilContentControlVisitor extends TraversalUtilVisitor<SdtElement> {
      
      IndentingVisitorCallback callback; // so we can get indentation
      
      @Override
      public void apply(SdtElement element, Object parent, List<Object> siblings) {
            WordprocessingMLPackage wordMLPackage = new WordprocessingMLPackage();
            if (element.getSdtContent() != null)
               wordMLPackage.getMainDocumentPart().addObject(element);

            try {
               wordMLPackage.save(new java.io.File(System.getProperty("user.dir") + "/xxx.docx") );
            } catch (Docx4JException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
            }
      }
   }
   
   public static class IndentingVisitorCallback extends SingleTraversalUtilVisitorCallback {

      public IndentingVisitorCallback(TraversalUtilVisitor visitor) {
         super(visitor);         
      }
      
      String indent = "";
      
      @Override
      public void walkJAXBElements(Object parent) {
         
         List children = getChildren(parent);
         if (children != null) {
            String oldIndent = indent;
            indent += "  ";
            for (Object o : children) {
               o = XmlUtils.unwrap(o);
               this.apply(o, parent, children);
               if (this.shouldTraverse(o)) {
                  walkJAXBElements(o);
               }
            }
            indent = oldIndent;
         }
      }
      
   }
}



however, there will be a 'java.lang.NullPointerException' when calling 'wordMLPackage.getMainDocumentPart().addObject(element)'. please help!

Re: convert content control's content to html

PostPosted: Thu Aug 23, 2012 11:39 pm
by jason
That construction doesn't create a MainDocumentPart for you. Try WordprocessingMLPackage.createPackage()

Re: convert content control's content to html

PostPosted: Fri Aug 24, 2012 5:02 am
by buptstehc
jason wrote:That construction doesn't create a MainDocumentPart for you. Try WordprocessingMLPackage.createPackage()


that is right! thanks! however , there will be a lot of 'java.lang.NullPointerException' when converting the newly created wordprocessingMLPackage to html. i guess maybe this is far from enough:
Code: Select all
wordMLPackage = WordprocessingMLPackage.createPackage();
wordMLPackage.getMainDocumentPart().addObject(element);

and maybe some parts are missing. so i want to know how can i create a sub-wordMLPackage from its parent, which could contain full information about the content control . some code example will be better! thanks again!

Re: convert content control's content to html

PostPosted: Fri Aug 24, 2012 8:49 am
by jason
You might need a styles part, numbering part, and if there are images or other objects, to copy those across.

As an alternative, you could try HtmlExporterNonXSLT, which has a method:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        /**
         * Generate HTML for the specified content.
         *
         * @param blockLevelContent
         * @return
         */

        public org.w3c.dom.Document export(Object blockLevelContent, String cssClass, String cssId)
 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4


You'll find that in the latest nightly builds.

Re: convert content control's content to html

PostPosted: Fri Aug 24, 2012 1:19 pm
by buptstehc
jason wrote:You might need a styles part, numbering part, and if there are images or other objects, to copy those across.

As an alternative, you could try HtmlExporterNonXSLT, which has a method:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        /**
         * Generate HTML for the specified content.
         *
         * @param blockLevelContent
         * @return
         */

        public org.w3c.dom.Document export(Object blockLevelContent, String cssClass, String cssId)
 
Parsed in 0.013 seconds, using GeSHi 1.0.8.4


You'll find that in the latest nightly builds.


thanks jason, it works very well!