Page 1 of 1

differences between htmlExporter and htmlExporterNG

PostPosted: Tue Aug 04, 2009 10:48 am
by kjsaila
Hi there,

I'm trying to convert a docx document to html and pass it to jsp page. First I used the htmlExporter class, but I had some xml code examples in the doc and after passing the converted text to the jsp page I couldn't display the example xml tags as string. Then I tried to use the htmlExporterNG class and the preformated stuff worked well, but I also had some tables in the doc and those didn't come out as perfect as those converted with htmlExporter.

So my question is that is there a simple way to fix either the htmlExporter or the htmlExporterNG so I could get the preformated stuff as well as tables converted perfectly?

Other than that I've been very pleased with the docx4j. Great job.

Re: differences between htmlExporter and htmlExporterNG

PostPosted: Tue Aug 04, 2009 6:23 pm
by jason
htmlExporter uses Microsoft's XSLT (6000 lines of it, from http://www.codeplex.com/OpenXMLViewer) to create the HTML; htmlExporterNG also uses a bit of XSLT, but more Java to do the work.

htmlExporterNG is our newer approach, which we are likely to improve over time.

I guess whichever is easier to fix for your purposes would be the way to go.

You could see whether the problem occurs with the current OpenXMLViewer code; if not, you could find the fix and copy it to the docx4j DocX2Html.xslt

So far as htmlExporterNG is concerned, what is the table problem. IIRC, we merge cells properly, but don't handle table borders, and maybe width/height properly. Improvements to the table handling would be gratefully accepted.

Re: differences between htmlExporter and htmlExporterNG

PostPosted: Wed Aug 05, 2009 9:06 am
by kjsaila
Well the problem with tables is that with the htmlExporter I get:

Code: Select all
<table class="TableList3-T">
      <tr class="TableList3-R">
        <td class="TableList3-C">
          <div class="TableList3-firstRow">
            <p class="Tableheader-P">
              <span class="Tableheader-H">
                <span/>
                <span>Header1</span>
              </span>
            </p>
          </div>
        </td>
        <td class="TableList3-C">
          <div class="TableList3-firstRow">
            <p class="Tableheader-P">
              <span class="Tableheader-H">
                <span />
                <span>Header2</span>
              </span>
            </p>
          </div>
        </td>
      </tr>
      <tr class="TableList3-R" style="height:0;">
        <td class="TableList3-C">
          <div class="TableList3-lastRow">
            <div class="TableList3-swCell">
              <p class="Tablecontent-P">
                <span class="Tablecontent-H">
                  <span/>
                  <span>Name:</span>
                </span>
              </p>
              <p class="Tablecontent-P">
                <span class="Tablecontent-H">
                  <span />
                  <span>Description:</span>
                </span>
              </p>
            </div>
          </div>
        </td>
        <td class="TableList3-C">
          <div class="TableList3-lastRow">
            <p class="Tablecontent-P">
              <span class="Tablecontent-H">Test name</span>
            </p>
            <p class="Tablecontent-P">
              <span class="Tablecontent-H">Test description</span>
            </p>
          </div>
        </td>
      </tr>
      <tr height="0">
        <td />
        <td />
      </tr>
    </table>


but with htmlExporterNG I get:

Code: Select all
<table>
      <colgroup span="2" />
      <tr>
        <td />
        <td>NOT IMPLEMENTED: support for trPr</td>
      </tr>
      <tr>
        <td />
        <td>NOT IMPLEMENTED: support for trPr</td>
      </tr>
</table>


If I understand correctly there can't be any paragraphs inside table cells with the NG or something like that.

Re: differences between htmlExporter and htmlExporterNG

PostPosted: Wed Aug 05, 2009 1:48 pm
by jason
Could you please post the WordML representing the table? The easiest way to get this is to save as XML from Word 2007.

Re: differences between htmlExporter and htmlExporterNG

PostPosted: Wed Aug 05, 2009 1:59 pm
by kjsaila
Here is the xml fragment:

Code: Select all
<w:tbl><w:tblPr><w:tblStyle w:val="TableList3"/><w:tblW w:w="5000" w:type="pct"/><w:tblLook w:val="01E0"/></w:tblPr><w:tblGrid><w:gridCol w:w="2838"/><w:gridCol w:w="6030"/></w:tblGrid><w:tr w:rsidR="00E54D1F" w:rsidRPr="00A84EA1"><w:trPr><w:cnfStyle w:val="100000000000"/></w:trPr><w:tc><w:tcPr><w:tcW w:w="1600" w:type="pct"/></w:tcPr><w:p w:rsidR="00E54D1F" w:rsidRPr="00A84EA1" w:rsidRDefault="00E54D1F" w:rsidP="00732A5E"><w:pPr><w:pStyle w:val="Tableheader"/><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr></w:pPr><w:r w:rsidRPr="00A84EA1"><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr><w:t>Property</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3400" w:type="pct"/></w:tcPr><w:p w:rsidR="00E54D1F" w:rsidRPr="00A84EA1" w:rsidRDefault="00E54D1F" w:rsidP="00732A5E"><w:pPr><w:pStyle w:val="Tableheader"/><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr></w:pPr><w:r w:rsidRPr="00A84EA1"><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr><w:t>Value</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="00E54D1F" w:rsidRPr="00A84EA1"><w:trPr><w:cnfStyle w:val="010000000000"/></w:trPr><w:tc><w:tcPr><w:cnfStyle w:val="000000000001"/><w:tcW w:w="1600" w:type="pct"/></w:tcPr><w:p w:rsidR="00E54D1F" w:rsidRPr="00A84EA1" w:rsidRDefault="00E54D1F" w:rsidP="00732A5E"><w:pPr><w:pStyle w:val="Tablecontent"/><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr></w:pPr><w:r w:rsidRPr="00A84EA1"><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr><w:t>Name:</w:t></w:r></w:p><w:p w:rsidR="00E54D1F" w:rsidRPr="00A84EA1" w:rsidRDefault="00E54D1F" w:rsidP="00732A5E"><w:pPr><w:pStyle w:val="Tablecontent"/><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr></w:pPr><w:r w:rsidRPr="00A84EA1"><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr><w:t>Description:</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3400" w:type="pct"/></w:tcPr><w:p w:rsidR="00E54D1F" w:rsidRPr="00A84EA1" w:rsidRDefault="00E54D1F" w:rsidP="00732A5E"><w:pPr><w:pStyle w:val="Tablecontent"/><w:cnfStyle w:val="010000000000"/><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr></w:pPr><w:r><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr><w:t>Test name</w:t></w:r></w:p><w:p w:rsidR="00E54D1F" w:rsidRPr="00A84EA1" w:rsidRDefault="00E54D1F" w:rsidP="00732A5E"><w:pPr><w:pStyle w:val="Tablecontent"/><w:cnfStyle w:val="010000000000"/><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr></w:pPr><w:r><w:rPr><w:noProof w:val="0"/><w:lang w:val="en-US"/></w:rPr><w:t>Test description</w:t></w:r></w:p></w:tc></w:tr></w:tbl>

Re: differences between htmlExporter and htmlExporterNG

PostPosted: Wed Aug 05, 2009 11:11 pm
by jason
This is fixed in current SVN (changset 865). The attributes on

Code: Select all
<w:tr w:rsidR="00E54D1F" w:rsidRPr="00A84EA1">


were causing problems.

Re: differences between htmlExporter and htmlExporterNG

PostPosted: Thu Aug 06, 2009 6:39 am
by kjsaila
Thanks for fix. Currently I'm using the 2.2.0 jar so I let you know how it is working as soon as I have the time to test the new version.

Re: differences between htmlExporter and htmlExporterNG

PostPosted: Mon Sep 07, 2009 9:51 am
by kjsaila
This also works. Job well done.