Page 1 of 1

RichText content control to html and vice versa

PostPosted: Sat Dec 04, 2010 1:23 am
by jm_thia
Hi Jason,

We are bulding a form with a rich text control. The control is not bind to a customXml file as we want styling, bullet and so in the text. The content of the control will be transformed to html and stored in SQL database. The content will finaly display in a web page where it could also be download as a docx.

I can extract the content as xml using below code.

But I am now stuck in transforming the content to HMTL.

Is there a better than building a new docx.

I am also looking for guidelines to put the html back in content control of the docx.

thanks,

Jean Marie

Code: Select all
      String inputfilepath = "C:/Work/dev/testDocx/docx/acti0.docx";
      String xpath = "//w:sdt/w:sdtContent[//w:sdt/w:sdtPr/w:alias/@w:val='rapport']";
            
      WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));   
      MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();         
      
      List<Object> list = documentPart.getJAXBNodesViaXPath(xpath, false);

      System.out.println("got " + list.size() + " matching " + xpath );
      
      for (Object o : list) {
         
         System.out.println(o.getClass().getName() );
         
         Object o2 = XmlUtils.unwrap(o);
                  

         if (o2 instanceof org.docx4j.wml.SdtContentBlock) {
                        
            // transformation en xml pour trace
            String dmp = org.docx4j.XmlUtils.marshaltoString(o2, true, true);
              System.out.println(dmp);
             
              // transformation en html pour persistance
             
              // création d'un nouveau doc
            System.out.println( "Creating package..");
            WordprocessingMLPackage CCwml = WordprocessingMLPackage.createPackage();

            // ajout du contenu du controle
             wordMLPackage.getMainDocumentPart().addObject(
                   org.docx4j.XmlUtils.unmarshalString(dmp) );

             // transformation
            AbstractHtmlExporter exporter = new HtmlExporterNG2();                      
            OutputStream os = new java.io.FileOutputStream(inputfilepath + ".html");
            
            javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(os);
            exporter.html(CCwml, result,
                     inputfilepath + "_files");

            System.out.println("Saved: " + inputfilepath + ".html using " +  exporter.getClass().getName() );
      
         }
         
      }


I get a nice XML

Code: Select all
<w:sdtContent xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:ns20="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:ns7="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:ns16="http://opendope.org/conditions" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:ns12="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing" xmlns:ns18="http://opendope.org/components" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:ns9="http://schemas.openxmlformats.org/drawingml/2006/chartDrawing" xmlns:ns17="http://opendope.org/questions" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:ns19="http://schemas.openxmlformats.org/drawingml/2006/compatibility" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage" xmlns:ns15="http://opendope.org/xpaths" xmlns:ns8="http://schemas.openxmlformats.org/drawingml/2006/chart" xmlns:ns10="http://schemas.openxmlformats.org/drawingml/2006/diagram">
    <w:p w:rsidR="004B40AC" w:rsidRDefault="004B40AC">
        <w:pPr>
            <w:rPr>
                <w:b/>
                <w:bCs/>
            </w:rPr>
        </w:pPr>
        <w:r>
            <w:rPr>
                <w:b/>
                <w:bCs/>
            </w:rPr>
            <w:t>Azertuiop</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="004B40AC" w:rsidRDefault="004B40AC">
        <w:pPr>
            <w:rPr>
                <w:bCs/>
            </w:rPr>
        </w:pPr>
        <w:r w:rsidRPr="004B40AC">
            <w:rPr>
                <w:bCs/>
            </w:rPr>
            <w:t>Et bla</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="00442A9C" w:rsidRDefault="00442A9C">
        <w:pPr>
            <w:rPr>
                <w:bCs/>
            </w:rPr>
        </w:pPr>
        <w:r>
            <w:rPr>
                <w:bCs/>
            </w:rPr>
            <w:t>Et puis</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="00442A9C" w:rsidP="00442A9C" w:rsidRDefault="00442A9C">
        <w:pPr>
            <w:pStyle w:val="Paragraphedeliste"/>
            <w:numPr>
                <w:ilvl w:val="0"/>
                <w:numId w:val="25"/>
            </w:numPr>
            <w:rPr>
                <w:bCs/>
            </w:rPr>
        </w:pPr>
        <w:r>
            <w:rPr>
                <w:bCs/>
            </w:rPr>
            <w:t>Point 1</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidRPr="00442A9C" w:rsidR="00442A9C" w:rsidP="00442A9C" w:rsidRDefault="00442A9C">
        <w:pPr>
            <w:pStyle w:val="Paragraphedeliste"/>
            <w:numPr>
                <w:ilvl w:val="0"/>
                <w:numId w:val="25"/>
            </w:numPr>
            <w:rPr>
                <w:bCs/>
            </w:rPr>
        </w:pPr>
        <w:r>
            <w:rPr>
                <w:bCs/>
            </w:rPr>
            <w:t>Point 2</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="00442A9C" w:rsidRDefault="00442A9C">
        <w:pPr>
            <w:rPr>
                <w:b/>
                <w:bCs/>
            </w:rPr>
        </w:pPr>
    </w:p>
    <w:p w:rsidRPr="00C71D85" w:rsidR="00C71D85" w:rsidRDefault="00C71D85">
        <w:pPr>
            <w:rPr>
                <w:b/>
                <w:bCs/>
                <w:i/>
            </w:rPr>
        </w:pPr>
        <w:r>
            <w:rPr>
                <w:b/>
                <w:bCs/>
            </w:rPr>
            <w:t>azer</w:t>
        </w:r>
        <w:r>
            <w:rPr>
                <w:b/>
                <w:bCs/>
                <w:i/>
            </w:rPr>
            <w:t>sdsqd</w:t>
        </w:r>
    </w:p>
</w:sdtContent>

Re: RichText content control to html and vice versa

PostPosted: Mon Dec 06, 2010 1:58 pm
by jason
Hi Jean Marie

jm_thia wrote:we want styling, bullet and so in the text. The content of the control will be transformed to html


As you'll have seen, the input to HTML transform is a WordML package.

If you want the HTML to be styled properly, you will need the package to contain a StylesDefinitionPart (from your original docx).

If you want bullets/numbering, you'll need the NumberingDefinitionsPart.

(This is why the input is a package, rather than just document.xml)

So, your WordML package: You can either use your original pkg (deleting the bits which don't match w:alias/@w:val='rapport', or perhaps better for you, change the XSLT src/main/java/org/docx4j/convert/out/html/docx2xhtmlNG2.xslt to ignore if no match ), or create a new package, and copy the above parts into it.

If you do create a new package, note that the createPackage() method adds a style definition part. You don't want that one; you want your own.

I'll reply separately on the HTML import question.

cheers .. Jason

Re: RichText content control to html and vice versa

PostPosted: Thu Jan 27, 2011 10:53 pm
by jm_thia
I changed my mind and used .Net. I end up using htmlconverter from powertools.codeplex.com.

But I reply to my post, just to say, I can put data back in the rich control using altchunk element. one last point don't forget to remove the show place holder flag.

Jean Marie