Plutext

Posted: **Thu Dec 22, 2016 9:23 pm**

Hi everyone. I'm trying to convert HTML to DOCX, but I always get errors.
I reead a lot of forum discussions but I have no idea how to solve the problem.
I'm calling a class in
My code

Code: Select all: @Override public void generateDoc(final String codSocietaSoa, final int docType) throws Exception { logger.debug("TEXT REPLACER SERVICE - start generateDoc"); Callable c = new Callable() { @Override public Object call() throws Exception { logger.debug("TEXT REPLACER SERVICE - start async thread"); try { // GET TEMPLATE FROM DB // Other operations... HeaderFooterGenerator headerFooterGenerator = new HeaderFooterGenerator(); ByteArrayOutputStream baosFinal = headerFooterGenerator.replaceAll(bais, imageReplacingMap, graphTagByteMap, htmlReplacingMap); // CLOSE //operations... // SAVE BYTE ARRAY TO DB //operations... } catch (Exception e) { e.printStackTrace(); saveDocumentService.insertLogError(codSocietaSoa, docType, e.getMessage()); } return null; } }; THREAD_POOL.submit(c); }

This code call the generator for replacing HTML code. (the name HEaderFooterGenerator is not pretty correct.. I know)

Code: Select all: import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.FileInputStream; import java.util.LinkedHashMap; import java.util.List; import java.util.Map; import org.apache.commons.io.IOUtils; import org.apache.log4j.Logger; import org.docx4j.TraversalUtil; import org.docx4j.UnitsOfMeasurement; import org.docx4j.convert.in.xhtml.XHTMLImporterImpl; import org.docx4j.dml.wordprocessingDrawing.Inline; import org.docx4j.finders.ClassFinder; import org.docx4j.openpackaging.packages.WordprocessingMLPackage; import org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage; import org.docx4j.openpackaging.parts.WordprocessingML.FooterPart; import org.docx4j.openpackaging.parts.WordprocessingML.HeaderPart; import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart; import org.docx4j.openpackaging.parts.relationships.Namespaces; import org.docx4j.openpackaging.parts.relationships.RelationshipsPart; import org.docx4j.relationships.Relationship; import org.docx4j.wml.Body; import org.docx4j.wml.Document; import org.docx4j.wml.Ftr; import org.docx4j.wml.Hdr; import org.docx4j.wml.ObjectFactory; import org.docx4j.wml.P; import org.docx4j.wml.R; import org.docx4j.wml.Tc; import org.docx4j.wml.Text; public class HeaderFooterGenerator { // code... public void replaceAllHtml(WordprocessingMLPackage wordMLPackage, MainDocumentPart mainDocumentPart, LinkedHashMap<String, byte[]> htmlReplacingMap) throws Exception { Document wmlDocumentEl = (Document) mainDocumentPart.getJaxbElement(); Body body = wmlDocumentEl.getBody(); if (htmlReplacingMap != null) { XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage); for (Map.Entry<String,byte[]> entry : htmlReplacingMap.entrySet()) { // ByteArrayInputStream inputStream = new ByteArrayInputStream(entry.getValue()); FileInputStream inputStream = new FileInputStream("D:\\html_example.html"); String html_unparsed = IOUtils.toString(inputStream); if (!html_unparsed.subSequence(0, 5).equals("<html>")) html_unparsed = "<html>" + html_unparsed; if (!html_unparsed.subSequence(html_unparsed.length()-6, html_unparsed.length()).equals("</html>")) html_unparsed = html_unparsed + "</html>"; String html = html_unparsed.replace(" ", "\u00A0"); ClassFinder classPFinder = new ClassFinder(P.class); new TraversalUtil(body, classPFinder); for (Object p : classPFinder.results) { P pNode = (P) p; ClassFinder classRFinder = new ClassFinder(R.class); new TraversalUtil(pNode, classRFinder); for (Object r : classRFinder.results) { R rNode = (R) r; ClassFinder classTFinder = new ClassFinder(Text.class); new TraversalUtil(rNode, classTFinder); for (Object t : classTFinder.results) { Text tNode = (Text) t; if (tNode.getValue().equals(entry.getKey())) { tNode.setValue(""); if (entry.getValue()!= null) { List<Object> a = XHTMLImporter.convert(html, null); R rOk = (R)tNode.getParent(); rOk.getContent().addAll(a); } } } } } } } } }

And the errors

Code: Select all: 22 dic 2016 10:41:44 | WARN | PropertyFactory .createProperties(160) | TODO - implement for CTTblStylePr! 22 dic 2016 10:41:44 | WARN | PropertyFactory .createProperties(160) | TODO - implement for CTTblStylePr! 22 dic 2016 10:41:44 | WARN | HtmlCssHelper .createCssForStyles(193) | ! null rPr for character style Carpredefinitoparagrafo org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser org.docx4j.org.xhtmlrenderer.load INFO:: Loaded document in ~2ms org.docx4j.org.xhtmlrenderer.load INFO:: TIME: parse stylesheets 138ms org.docx4j.org.xhtmlrenderer.match INFO:: media = print org.docx4j.org.xhtmlrenderer.match INFO:: Matcher created with 181 selectors 22 dic 2016 10:41:45 | WARN | SpaceAfter .<init>(79) | No support for unit: CSS_EMS; instead of em, please use an absolute unit. 22 dic 2016 10:41:45 | WARN | FontHandler .setRFont(91) | No mapping for: 'serif'

Any ideas?
Thank you so much!

Posted: **Thu Dec 22, 2016 9:28 pm**

What is your input XHTML?

What output are you getting?

What is the error, specifically, of concern?

Posted: **Thu Dec 22, 2016 9:37 pm**

Thank you for your answer.
I tried to get HTML from DB and from File. The HTML is not a problem. I have a log to display HTML content and it is displayed correctly in console.

I tried with a simple HTML code:

Code: Select all: <HTML><p>HTML TEST</p></HTML>

I have to replace a TAG with a HTML content.

Code: Select all: if (tNode.getValue().equals(entry.getKey())) { tNode.setValue(""); if (entry.getValue()!= null) { List<Object> a = XHTMLImporter.convert(html, null); R rOk = (R)tNode.getParent(); rOk.getContent().addAll(a); }

The TAG is correctly replaced with "" but the insertion of the HTML contents does not take place.
The result is a blank DOCX file.

The error is that does not add the HTML content.

Posted: **Fri Dec 23, 2016 7:23 am**

Verify your List<Object> a has content.

That will be block level content, which you should not be adding to your R rOk.

You'll need to replace the parent P with it. (In other words, you can't add it to a P either!)

Plutext

Error conerting HTML in separate thread

Error conerting HTML in separate thread

Re: Error conerting HTML in separate thread

Re: Error conerting HTML in separate thread

Re: Error conerting HTML in separate thread