Page 1 of 1

Escaping bug in creating Hyperlink

PostPosted: Mon Jun 29, 2015 8:43 pm
by martijnc
Hyperlinks with (escaped) angle brackets in the linktext do not work. Escaping linktext is in the current version 3.2.2 only done manually and only for quotes.
Code: Select all
<a href="http://www.google.com">&lt;Smile 2 18-06-2015 09:53&gt;</a>


The document will still be generated, but without the link and linktext.

The exception:
29 Jun 11:22:03.274 ERROR o.d.c.in.xhtml.XHTMLImporterImpl - Dodgy link text: '<Smile 2 18-06-2015 09:53>'
javax.xml.bind.UnmarshalException: null
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:563) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:249) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:214) ~[na:1.8.0_45]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:157) ~[na:1.8.0_45]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:125) ~[na:1.8.0_45]
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:433) ~[docx4j-3.2.1.jar:na]
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:413) ~[docx4j-3.2.1.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.createHyperlink(XHTMLImporterImpl.java:2136) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.processInlineBox(XHTMLImporterImpl.java:1667) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1232) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1216) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1216) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1216) [docx4j-ImportXHTML-3.2.2.jar:na]
........
Caused by: org.xml.sax.SAXParseException: Element type "Smile" must be followed by either attribute specifications, ">" or "/>".
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239) ~[na:1.8.0_45]
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:243) ~[na:1.8.0_45]
... 101 common frames omitted

Instead of only escaping quotes at the start of the method i changed the code to unmarshal without the link text and set the linktext after in the object so it is automatically escaped.

Code: Select all
private Hyperlink createHyperlink(String url, RPr rPr, String linkText, RelationshipsPart rp) {

        //        if (linkText.contains("&") && !linkText.contains("&amp;")) {
        //            // escape them so we can unmarshall
        //            linkText = linkText.replace("&", "&amp;");
        //        }

        try {
            String hpl = null;

            if (url.startsWith("#")) { // Internal link --> w:anchor

                //                hpl = "<w:hyperlink w:anchor=\"" + bookmarkHelper.anchorToBookmarkName(bookmarkNamePrefix, url)
                //                        + "\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" "
                //                        + "xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" >" + "<w:r>" + "<w:t>" + linkText + "</w:t>"
                //                        + "</w:r>" + "</w:hyperlink>";
                hpl = "<w:hyperlink w:anchor=\"" + bookmarkHelper.anchorToBookmarkName(bookmarkNamePrefix, url)
                        + "\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" "
                        + "xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" >" + "<w:r>" + "<w:t></w:t>" + "</w:r>"
                        + "</w:hyperlink>";

            } else {                   // External link --> r:id

                // We need to add a relationship to word/_rels/document.xml.rels
                // but since its external, we don't use the
                // usual wordMLPackage.getMainDocumentPart().addTargetPart
                // mechanism
                org.docx4j.relationships.ObjectFactory factory = new org.docx4j.relationships.ObjectFactory();

                org.docx4j.relationships.Relationship rel = factory.createRelationship();
                rel.setType(Namespaces.HYPERLINK);
                rel.setTarget(url);
                rel.setTargetMode("External");

                rp.addRelationship(rel);

                // addRelationship sets the rel's @Id

                //                hpl = "<w:hyperlink r:id=\"" + rel.getId() + "\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" "
                //                        + "xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" >" + "<w:r>" + "<w:t>" + "</w:t>" + linkText + "</w:r>"
                //                        + "</w:hyperlink>";

                hpl = "<w:hyperlink r:id=\"" + rel.getId() + "\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" "
                        + "xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" >" + "<w:r>" + "<w:t>" + "</w:t>" + "</w:r>"
                        + "</w:hyperlink>";
            }

            Hyperlink hyperlink = (Hyperlink) XmlUtils.unmarshalString(hpl);
            R r = (R) hyperlink.getContent().get(0);
            //
            ((JAXBElement<Text>) r.getContent().get(0)).getValue().setValue(linkText);
            r.setRPr(rPr);

            // Style the hyperlink with hyperlinkStyleId,
            // unless another style is already in use
            P currentP = getCurrentParagraph(false);

            //          System.out.println("p/h:" + XmlUtils.marshaltoString(currentP));

            //          if (currentP.getPPr()!=null
            //                  && currentP.getPPr().getRPr()!=null
            //                  && currentP.getPPr().getRPr().getRStyle()!=null) {
            //             
            //              // Respect p/ppr/rpr
            //             
            //          } else

            if (rPr.getRStyle() == null // don't set it if its set already
                    && hyperlinkStyleId != null) {
                RStyle rStyle = Context.getWmlObjectFactory().createRStyle();
                rStyle.setVal(hyperlinkStyleId);
                rPr.setRStyle(rStyle);
            }
            return hyperlink;

        } catch (Exception e) {
            // eg  org.xml.sax.SAXParseException: The reference to entity "ballot_id" must end with the ';' delimiter.
            log.error("Dodgy link text: '" + linkText + "'", e);
            return null;
        }

    }


Will this fix (or another fix that fixes this problem) be in 3.2.3 and is there already an ETA on that? Meanwhile I'll work on a patched version of the jar, but i rather work with an 'official' one.