Page 1 of 1

Issue with MathML

PostPosted: Thu Aug 11, 2022 7:41 pm
by Milqn
Hi,
Nice work you have done with Docx4j!

I need to convert html with MathML to docx but during conversion MathML tags are striped.

How could I make this work?
Hope you can help me.

Thanks
Milqn

Re: Issue with MathML

PostPosted: Fri Aug 12, 2022 7:50 pm
by jason
There is XSLT for converting OMML to MathML, but maybe not for the converse.

Google reveals python and npm libraries called mathml2omml (and there may be others); if one of these work well enough on your MathML, you could integrate them into docx4j-ImportXHTML.

Re: Issue with MathML

PostPosted: Fri Aug 12, 2022 10:06 pm
by Milqn
Thank you for your answer!

I also thought that my issue could arise due to the Ooml.

MathML tags are not preserved and hence, LibreOffice also can't display math.

Re: Issue with MathML

PostPosted: Tue Aug 16, 2022 7:37 am
by Milqn
Please don't mind but I need to ask what would be the best way to include MML2OMML?
I have find MML2OMML.XSL stylesheet.

How would you do it?

Thanks a lot

Re: Issue with MathML

PostPosted: Fri Aug 19, 2022 4:41 am
by Milqn
Any assistance would be gratefully appreciated.

jason wrote:There is XSLT for converting OMML to MathML, but maybe not for the converse.

Google reveals python and npm libraries called mathml2omml (and there may be others); if one of these work well enough on your MathML, you could integrate them into docx4j-ImportXHTML.

Re: Issue with MathML

PostPosted: Tue Aug 23, 2022 9:57 pm
by jason
Here's a sketch of how to do it to help you get started, mostly untested, but based on importing the following sample XHTML:

Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>MathML in XHTML</title>
</head>
<body>


  <p>
    Follows:
    <math xmlns="http://www.w3.org/1998/Math/MathML">

      <mfrac>
        <mn>1</mn>
        <msqrt>
          <mn>2</mn>
        </msqrt>
      </mfrac>
    </math>
  </p>


</body>
</html>


In XHTMLImporterImpl, at approx line 1383, add and flesh out the following block of code:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting

                } else if  (e.getNodeName().equals("math")) {
                       
                        // handle me
                        System.out.println("TODO: Handle mathml \n\r" + XmlUtils.w3CDomNodeToString(e) );
                       
                        // Prepare to transform Element e
                        Templates xslt = null; // your mathml2omml.xslt
                       
                        // Use constructor which takes Unmarshaller, rather than JAXBContext,
                        // so we can set JaxbValidationEventHandler
                        JAXBContext jc = Context.jc;
                        Unmarshaller u;
                                try {
                                        u = jc.createUnmarshaller();
                                u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());
                                jakarta.xml.bind.util.JAXBResult result = new jakarta.xml.bind.util.JAXBResult(u );
                               
                                XmlUtils.transform(new DOMSource(e), xslt, null, result);
                               
                                // What happened?
                                Object o = result.getResult();
                               
                                // Attach it to the document
                                this.contentContextStack.peek().getContent().add(); // or addAll
                               
                                } catch (JAXBException e1) {
                                        e1.printStackTrace();
                                }
                       
                        return;

 
Parsed in 0.017 seconds, using GeSHi 1.0.8.4

Re: Issue with MathML

PostPosted: Wed Aug 24, 2022 10:41 pm
by jason
The following proof of concept code works on my sample XHTML:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
                } else if  (e.getNodeName().equals("math")) {
                       
                        // handle me
                        System.out.println("Handling mathml \n\r" + XmlUtils.w3CDomNodeToString(e) );
                       
                                try {
                                // Prepare to transform Element e
                                Source xsltSource  = new StreamSource(
                                                ResourceUtils.getResource(
                                                                "mml2omml.xsl")
                                                ); // https://raw.githubusercontent.com/Marti ... l2omml.xsl
                                /* You need to add the template:
                                 *
                                                  <xsl:template match="/|*">
                                                    <oMath>
                                                      <xsl:apply-templates mode="mml"  />
                                                    </oMath>
                                                  </xsl:template>

                                 */

                                Templates xslt  = XmlUtils.getTransformerTemplate(xsltSource);                 
                               
                                // Use constructor which takes Unmarshaller, rather than JAXBContext,
                                // so we can set JaxbValidationEventHandler
                                JAXBContext jc = Context.jc;
                                Unmarshaller u = jc.createUnmarshaller();
                                u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());
                                jakarta.xml.bind.util.JAXBResult result = new jakarta.xml.bind.util.JAXBResult(u );
                               
                                XmlUtils.transform(new DOMSource(e), xslt, null, result);
                               
                                // What happened?
                                org.docx4j.math.CTOMath math = (org.docx4j.math.CTOMath)XmlUtils.unwrap(result.getResult());
                               
                                org.docx4j.math.ObjectFactory mathObjectFactory = new org.docx4j.math.ObjectFactory();
                        // Create object for oMathPara (wrapped in JAXBElement)
                        CTOMathPara omathpara = mathObjectFactory.createCTOMathPara();
                        JAXBElement<org.docx4j.math.CTOMathPara> omathparaWrapped = mathObjectFactory.createOMathPara(omathpara);
                       
                        omathpara.getOMath().add(math);
                       
                        P wP = new P();
                        wP.getContent().add(omathparaWrapped);
                       
                                // Attach it to the document
                                this.contentContextStack.peek().getContent().add(wP);
                               
                                } catch (Exception e1) {
                                        throw new Docx4JException("Error processing MathML", e1);
                                }
                       
                        return;

 
Parsed in 0.017 seconds, using GeSHi 1.0.8.4


Note that the heavy lifting is done by https://raw.githubusercontent.com/Marti ... l2omml.xsl

Re: Issue with MathML

PostPosted: Thu Aug 25, 2022 9:03 am
by infomladen
That's what I'm looking for. Great work.
Thanks a lot, Jason.

Re: Issue with MathML

PostPosted: Fri Feb 07, 2025 3:20 am
by pyaeth
Hello,

The issue is now fixed with the following:
- use later version of docx4j-importxhtml (I only needed the core library so I only referenced this one in my project) :
Code: Select all
<dependency>
  <groupId>org.docx4j</groupId>
  <artifactId>docx4j-ImportXHTML-core</artifactId>
  <version>11.5.0</version>
</dependency>

Note: You can visit the docx4j-importxhtml GitHub repository https://github.com/plutext/docx4j-ImportXHTML/tree/VERSION_11_4_8/docx4j-ImportXHTML-core and check the XHTMLImporterImpl file on the version that you are interested in. Make sure you find the right code if you search by "math" (including the quotes).
- use the correct mml2omml.xlsZ file (make sure this is the right name and extension, as otherwise you will get an error while trying to decode, that the file wasn't found on classpath). The correct one was shared earlier by jason :
Note that the heavy lifting is done by https://raw.githubusercontent.com/Marti ... l2omml.xsl

In which you must add :
Code: Select all
<xsl:template match="/|*">
  <oMath>
       <xsl:apply-templates mode="mml"  />
   </oMath>
</xsl:template>

I added it right after
Code: Select all
<xsl:variable name="StrLCAlphabet">abcdefghijklmnopqrstuvwxyz</xsl:variable>

Cheers,
A.