Page 1 of 1

Math equations

PostPosted: Thu May 03, 2018 8:12 pm
by rbr
Hello,

I'm trying to achieve DOCX to PDF or XHTML transformation with Math equations.
I saw that Docx4j does not support OMML. I also tried to search the forum, I was able to find some quite old threads that are not really helping me.

So I am trying to pre-process the DOCX before sending it to DOCX4J.
I used Apache POI to iterate over the paragraphs and I was able to transform equations into MathML with OMML2MML.xsl but I have no clue how to "inject" these MathML fragments into the DOCX or passing them to DOCX4J if that's ever possible?

Another idea would be to replace the OMML by plain text (with the new generated MathML equations) into DOCX so that DOCX4J does not process it ? Some sort of hidden/ignore field that I could post-process after DOCX4J transformation ??

Thanks in advance !

Re: Math equations

PostPosted: Thu May 03, 2018 10:25 pm
by jason
I guess you saw https://stackoverflow.com/questions/447 ... 5#44809755

No reason you can't do similar in docx4j (ie without POI).

If you want XHTML or PDF, then I guess what you want to do during the docx export, take the Office Math and convert it to MathML, then include that in your XHTML or XSL FO. If you are getting there via docx4j's XSLT, you may be able to just import|include omml2mathml in the relevant xslt.

Re: Math equations

PostPosted: Fri May 04, 2018 9:03 pm
by rbr
Hi Jason thanks for your answer, that's exactly what I'm trying to do

jason wrote:If you want XHTML or PDF, then I guess what you want to do during the docx export, take the Office Math and convert it to MathML, then include that in your XHTML or XSL FO. If you are getting there via docx4j's XSLT, you may be able to just import|include omml2mathml in the relevant xslt.


Are you referring to this flag ?
Code: Select all
Docx4J.FLAG_EXPORT_PREFER_XS


I tried to include omml2mml.xsl inside docx2xhtml.xslt but I have the feeling something rewrites tag with docx2xhtml.xslt

When I use omml2mml.xsl outside Docx4j I get this which is pretty good :
Code: Select all
<mml:math
  xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
  xmlns:mml="http://www.w3.org/1998/Math/MathML">
  <mml:mi>x</mml:mi>
  <mml:mo>=</mml:mo>
  <mml:mfrac>
    <mml:mrow>
      <mml:mo>-</mml:mo>
      <mml:mi>b</mml:mi>
      <mml:mo>±</mml:mo>
      <mml:msqrt>
        <mml:msup>
          <mml:mrow>
            <mml:mi>b</mml:mi>
          </mml:mrow>
          <mml:mrow>
            <mml:mn>2</mml:mn>
          </mml:mrow>
        </mml:msup>
        <mml:mo>-</mml:mo>
        <mml:mn>4</mml:mn>
        <mml:mi>a</mml:mi>
        <mml:mi>c</mml:mi>
      </mml:msqrt>
    </mml:mrow>
    <mml:mrow>
      <mml:mn>2</mml:mn>
      <mml:mi>a</mml:mi>
    </mml:mrow>
  </mml:mfrac>
</mml:math>


But with Docx4j I get this :
Code: Select all
<p class="Normal DocDefaults ">
  <mi xmlns="http://www.w3.org/1998/Math/MathML">x</mi>
  <mo xmlns="http://www.w3.org/1998/Math/MathML">=</mo>
  <mfrac xmlns="http://www.w3.org/1998/Math/MathML">
    <mrow>
      <mo>-</mo>
      <mi>b</mi>
      <mo>±</mo>
      <msqrt>
        <msup>
          <mrow>
            <mi>b</mi>
          </mrow>
          <mrow>
            <mn>2</mn>
          </mrow>
        </msup>
        <mo>-</mo>
        <mn>4</mn>
        <mi>a</mi>
        <mi>c</mi>
      </msqrt>
    </mrow>
    <mrow>
      <mn>2</mn>
      <mi>a</mi>
    </mrow>
  </mfrac>
</p>


What am I doing wrong or should I need to edit docx2xhtml-core.xslt ?

Re: Math equations

PostPosted: Sun May 06, 2018 8:42 pm
by jason
If you convert docx to html via XSLT as things stand, you'll get:

Code: Select all
WARN org.docx4j.convert.out.common.AbstractConversionContext .notImplemented line 44 - NOT IMPLEMENTED: support for m:oMathPara;
DEBUG org.docx4j.utils.ResourceUtils .getResource line 70 - Attempting to load: org/docx4j/org/apache/xml/serializer/docx4j_xalan_output_xml.properties
DEBUG org.docx4j.convert.out.common.AbstractConversionContext .notImplemented line 50 - <m:oMathPara xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">...</m:oMathPara>
D


Adding <xsl:include href="OMML2MML.xslt" /> to docx4jxhtml-core.xslt is enough to get you mathml.

But for browsers to display it, uncomment the three bits shown in https://github.com/plutext/docx4j/commi ... cc3eac3fbb

Note: I renamed OMML2MML.XSL to OMML2MML.xslt (since pom configures maven to copy *.xslt to output); and I changed the output method in OMML2MML.xslt from UTF-16 to UTF-8, to match the xslt.