Page 1 of 1

HTML To Docx adds extra spaces for the table

PostPosted: Sun Nov 24, 2019 4:37 pm
by kapil
My use Cases : Use HTML string and convert it to a docx.

I am using the following code to create a document

Code: Select all
inputfilepath1 = HTML File contains following String
<table>
    <tbody>
      <tr>
        <td>
         <span>Serious Adverse Event</span>
         </td>
    </tr>
    </tbody>
  </table>



Code to generate the word file
Code: Select all
Body body = wordMLPackage.getMainDocumentPart().getJaxbElement().getBody();
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
wordMLPackage.getMainDocumentPart().getContent().addAll(xHTMLImporter.convert(new File(inputfilepath1), null));
System.out.println(XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true,true));

File output1 = new java.io.File("C:\\html_table.docx");
wordMLPackage.save(output1);



XML String generated in my code , I noticed the following text :
Code: Select all
<w:t>Serious Adverse Event</w:t>


When I view the OpenXML of the generated document ( attached here) for the same text I noticed an extra attribute xml:preserve has been added.
Code: Select all

<w:t xml:space="preserve">                     Serious Adverse Event          </w:t>


Is this an expected behavior ? Is there a way I can disable this extra attribute for my HTML fragment ?

Please let me know if there are any other alternatives to do so ?

Re: HTML To Docx adds extra spaces for the table

PostPosted: Wed Nov 27, 2019 7:10 am
by kapil
I was able to resolve this my removing all white spaces, \n , \t between HTML tags and then the alignment was fine.

Problem was not related to <w:t xml:space="preserve"> .