Page 1 of 1

Issue with MathML

PostPosted: Thu Aug 11, 2022 7:41 pm
by Milqn
Hi,
Nice work you have done with Docx4j!

I need to convert html with MathML to docx but during conversion MathML tags are striped.

How could I make this work?
Hope you can help me.

Thanks
Milqn

Re: Issue with MathML

PostPosted: Fri Aug 12, 2022 7:50 pm
by jason
There is XSLT for converting OMML to MathML, but maybe not for the converse.

Google reveals python and npm libraries called mathml2omml (and there may be others); if one of these work well enough on your MathML, you could integrate them into docx4j-ImportXHTML.

Re: Issue with MathML

PostPosted: Fri Aug 12, 2022 10:06 pm
by Milqn
Thank you for your answer!

I also thought that my issue could arise due to the Ooml.

MathML tags are not preserved and hence, LibreOffice also can't display math.

Re: Issue with MathML

PostPosted: Tue Aug 16, 2022 7:37 am
by Milqn
Please don't mind but I need to ask what would be the best way to include MML2OMML?
I have find MML2OMML.XSL stylesheet.

How would you do it?

Thanks a lot

Re: Issue with MathML

PostPosted: Fri Aug 19, 2022 4:41 am
by Milqn
Any assistance would be gratefully appreciated.

jason wrote:There is XSLT for converting OMML to MathML, but maybe not for the converse.

Google reveals python and npm libraries called mathml2omml (and there may be others); if one of these work well enough on your MathML, you could integrate them into docx4j-ImportXHTML.

Re: Issue with MathML

PostPosted: Tue Aug 23, 2022 9:57 pm
by jason
Here's a sketch of how to do it to help you get started, mostly untested, but based on importing the following sample XHTML:

Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>MathML in XHTML</title>
</head>
<body>


  <p>
    Follows:
    <math xmlns="http://www.w3.org/1998/Math/MathML">

      <mfrac>
        <mn>1</mn>
        <msqrt>
          <mn>2</mn>
        </msqrt>
      </mfrac>
    </math>
  </p>


</body>
</html>


In XHTMLImporterImpl, at approx line 1383, add and flesh out the following block of code:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting

                } else if  (e.getNodeName().equals("math")) {
                       
                        // handle me
                        System.out.println("TODO: Handle mathml \n\r" + XmlUtils.w3CDomNodeToString(e) );
                       
                        // Prepare to transform Element e
                        Templates xslt = null; // your mathml2omml.xslt
                       
                        // Use constructor which takes Unmarshaller, rather than JAXBContext,
                        // so we can set JaxbValidationEventHandler
                        JAXBContext jc = Context.jc;
                        Unmarshaller u;
                                try {
                                        u = jc.createUnmarshaller();
                                u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());
                                jakarta.xml.bind.util.JAXBResult result = new jakarta.xml.bind.util.JAXBResult(u );
                               
                                XmlUtils.transform(new DOMSource(e), xslt, null, result);
                               
                                // What happened?
                                Object o = result.getResult();
                               
                                // Attach it to the document
                                this.contentContextStack.peek().getContent().add(); // or addAll
                               
                                } catch (JAXBException e1) {
                                        e1.printStackTrace();
                                }
                       
                        return;

 
Parsed in 0.016 seconds, using GeSHi 1.0.8.4

Re: Issue with MathML

PostPosted: Wed Aug 24, 2022 10:41 pm
by jason
The following proof of concept code works on my sample XHTML:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
                } else if  (e.getNodeName().equals("math")) {
                       
                        // handle me
                        System.out.println("Handling mathml \n\r" + XmlUtils.w3CDomNodeToString(e) );
                       
                                try {
                                // Prepare to transform Element e
                                Source xsltSource  = new StreamSource(
                                                ResourceUtils.getResource(
                                                                "mml2omml.xsl")
                                                ); // https://raw.githubusercontent.com/Marti ... l2omml.xsl
                                /* You need to add the template:
                                 *
                                                  <xsl:template match="/|*">
                                                    <oMath>
                                                      <xsl:apply-templates mode="mml"  />
                                                    </oMath>
                                                  </xsl:template>

                                 */

                                Templates xslt  = XmlUtils.getTransformerTemplate(xsltSource);                 
                               
                                // Use constructor which takes Unmarshaller, rather than JAXBContext,
                                // so we can set JaxbValidationEventHandler
                                JAXBContext jc = Context.jc;
                                Unmarshaller u = jc.createUnmarshaller();
                                u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());
                                jakarta.xml.bind.util.JAXBResult result = new jakarta.xml.bind.util.JAXBResult(u );
                               
                                XmlUtils.transform(new DOMSource(e), xslt, null, result);
                               
                                // What happened?
                                org.docx4j.math.CTOMath math = (org.docx4j.math.CTOMath)XmlUtils.unwrap(result.getResult());
                               
                                org.docx4j.math.ObjectFactory mathObjectFactory = new org.docx4j.math.ObjectFactory();
                        // Create object for oMathPara (wrapped in JAXBElement)
                        CTOMathPara omathpara = mathObjectFactory.createCTOMathPara();
                        JAXBElement<org.docx4j.math.CTOMathPara> omathparaWrapped = mathObjectFactory.createOMathPara(omathpara);
                       
                        omathpara.getOMath().add(math);
                       
                        P wP = new P();
                        wP.getContent().add(omathparaWrapped);
                       
                                // Attach it to the document
                                this.contentContextStack.peek().getContent().add(wP);
                               
                                } catch (Exception e1) {
                                        throw new Docx4JException("Error processing MathML", e1);
                                }
                       
                        return;

 
Parsed in 0.017 seconds, using GeSHi 1.0.8.4


Note that the heavy lifting is done by https://raw.githubusercontent.com/Marti ... l2omml.xsl

Re: Issue with MathML

PostPosted: Thu Aug 25, 2022 9:03 am
by infomladen
That's what I'm looking for. Great work.
Thanks a lot, Jason.