Escaping bug in creating Hyperlink
Posted: Mon Jun 29, 2015 8:43 pm
Hyperlinks with (escaped) angle brackets in the linktext do not work. Escaping linktext is in the current version 3.2.2 only done manually and only for quotes.
The document will still be generated, but without the link and linktext.
The exception:
29 Jun 11:22:03.274 ERROR o.d.c.in.xhtml.XHTMLImporterImpl - Dodgy link text: '<Smile 2 18-06-2015 09:53>'
javax.xml.bind.UnmarshalException: null
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:563) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:249) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:214) ~[na:1.8.0_45]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:157) ~[na:1.8.0_45]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:125) ~[na:1.8.0_45]
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:433) ~[docx4j-3.2.1.jar:na]
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:413) ~[docx4j-3.2.1.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.createHyperlink(XHTMLImporterImpl.java:2136) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.processInlineBox(XHTMLImporterImpl.java:1667) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1232) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1216) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1216) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1216) [docx4j-ImportXHTML-3.2.2.jar:na]
........
Caused by: org.xml.sax.SAXParseException: Element type "Smile" must be followed by either attribute specifications, ">" or "/>".
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239) ~[na:1.8.0_45]
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:243) ~[na:1.8.0_45]
... 101 common frames omitted
Instead of only escaping quotes at the start of the method i changed the code to unmarshal without the link text and set the linktext after in the object so it is automatically escaped.
Will this fix (or another fix that fixes this problem) be in 3.2.3 and is there already an ETA on that? Meanwhile I'll work on a patched version of the jar, but i rather work with an 'official' one.
- Code: Select all
<a href="http://www.google.com"><Smile 2 18-06-2015 09:53></a>
The document will still be generated, but without the link and linktext.
The exception:
29 Jun 11:22:03.274 ERROR o.d.c.in.xhtml.XHTMLImporterImpl - Dodgy link text: '<Smile 2 18-06-2015 09:53>'
javax.xml.bind.UnmarshalException: null
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:563) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:249) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:214) ~[na:1.8.0_45]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:157) ~[na:1.8.0_45]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:125) ~[na:1.8.0_45]
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:433) ~[docx4j-3.2.1.jar:na]
at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:413) ~[docx4j-3.2.1.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.createHyperlink(XHTMLImporterImpl.java:2136) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.processInlineBox(XHTMLImporterImpl.java:1667) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1232) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1216) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1216) [docx4j-ImportXHTML-3.2.2.jar:na]
at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.traverse(XHTMLImporterImpl.java:1216) [docx4j-ImportXHTML-3.2.2.jar:na]
........
Caused by: org.xml.sax.SAXParseException: Element type "Smile" must be followed by either attribute specifications, ">" or "/>".
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239) ~[na:1.8.0_45]
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) ~[na:1.8.0_45]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:243) ~[na:1.8.0_45]
... 101 common frames omitted
Instead of only escaping quotes at the start of the method i changed the code to unmarshal without the link text and set the linktext after in the object so it is automatically escaped.
- Code: Select all
private Hyperlink createHyperlink(String url, RPr rPr, String linkText, RelationshipsPart rp) {
// if (linkText.contains("&") && !linkText.contains("&")) {
// // escape them so we can unmarshall
// linkText = linkText.replace("&", "&");
// }
try {
String hpl = null;
if (url.startsWith("#")) { // Internal link --> w:anchor
// hpl = "<w:hyperlink w:anchor=\"" + bookmarkHelper.anchorToBookmarkName(bookmarkNamePrefix, url)
// + "\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" "
// + "xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" >" + "<w:r>" + "<w:t>" + linkText + "</w:t>"
// + "</w:r>" + "</w:hyperlink>";
hpl = "<w:hyperlink w:anchor=\"" + bookmarkHelper.anchorToBookmarkName(bookmarkNamePrefix, url)
+ "\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" "
+ "xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" >" + "<w:r>" + "<w:t></w:t>" + "</w:r>"
+ "</w:hyperlink>";
} else { // External link --> r:id
// We need to add a relationship to word/_rels/document.xml.rels
// but since its external, we don't use the
// usual wordMLPackage.getMainDocumentPart().addTargetPart
// mechanism
org.docx4j.relationships.ObjectFactory factory = new org.docx4j.relationships.ObjectFactory();
org.docx4j.relationships.Relationship rel = factory.createRelationship();
rel.setType(Namespaces.HYPERLINK);
rel.setTarget(url);
rel.setTargetMode("External");
rp.addRelationship(rel);
// addRelationship sets the rel's @Id
// hpl = "<w:hyperlink r:id=\"" + rel.getId() + "\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" "
// + "xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" >" + "<w:r>" + "<w:t>" + "</w:t>" + linkText + "</w:r>"
// + "</w:hyperlink>";
hpl = "<w:hyperlink r:id=\"" + rel.getId() + "\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" "
+ "xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" >" + "<w:r>" + "<w:t>" + "</w:t>" + "</w:r>"
+ "</w:hyperlink>";
}
Hyperlink hyperlink = (Hyperlink) XmlUtils.unmarshalString(hpl);
R r = (R) hyperlink.getContent().get(0);
//
((JAXBElement<Text>) r.getContent().get(0)).getValue().setValue(linkText);
r.setRPr(rPr);
// Style the hyperlink with hyperlinkStyleId,
// unless another style is already in use
P currentP = getCurrentParagraph(false);
// System.out.println("p/h:" + XmlUtils.marshaltoString(currentP));
// if (currentP.getPPr()!=null
// && currentP.getPPr().getRPr()!=null
// && currentP.getPPr().getRPr().getRStyle()!=null) {
//
// // Respect p/ppr/rpr
//
// } else
if (rPr.getRStyle() == null // don't set it if its set already
&& hyperlinkStyleId != null) {
RStyle rStyle = Context.getWmlObjectFactory().createRStyle();
rStyle.setVal(hyperlinkStyleId);
rPr.setRStyle(rStyle);
}
return hyperlink;
} catch (Exception e) {
// eg org.xml.sax.SAXParseException: The reference to entity "ballot_id" must end with the ';' delimiter.
log.error("Dodgy link text: '" + linkText + "'", e);
return null;
}
}
Will this fix (or another fix that fixes this problem) be in 3.2.3 and is there already an ETA on that? Meanwhile I'll work on a patched version of the jar, but i rather work with an 'official' one.