Page 1 of 1

Error conerting HTML in separate thread

PostPosted: Thu Dec 22, 2016 9:23 pm
by antonio9285
Hi everyone. I'm trying to convert HTML to DOCX, but I always get errors.
I reead a lot of forum discussions but I have no idea how to solve the problem.
I'm calling a class in
My code

Code: Select all
   @Override
   public void generateDoc(final String codSocietaSoa, final int docType) throws Exception {
      logger.debug("TEXT REPLACER SERVICE - start generateDoc");
      Callable c = new Callable() {

         @Override
         public Object call() throws Exception {
            logger.debug("TEXT REPLACER SERVICE - start async thread");
            try {
               // GET TEMPLATE FROM DB
// Other operations...
               HeaderFooterGenerator headerFooterGenerator = new HeaderFooterGenerator();
               ByteArrayOutputStream baosFinal = headerFooterGenerator.replaceAll(bais, imageReplacingMap, graphTagByteMap, htmlReplacingMap);
               
               // CLOSE
//operations...               
               // SAVE BYTE ARRAY TO DB
//operations...
            } catch (Exception e) {            
               e.printStackTrace();
               saveDocumentService.insertLogError(codSocietaSoa, docType, e.getMessage());
            }
            return null;
         }
      };
      THREAD_POOL.submit(c);
   }



This code call the generator for replacing HTML code. (the name HEaderFooterGenerator is not pretty correct.. I know)

Code: Select all
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import org.apache.commons.io.IOUtils;
import org.apache.log4j.Logger;
import org.docx4j.TraversalUtil;
import org.docx4j.UnitsOfMeasurement;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.dml.wordprocessingDrawing.Inline;
import org.docx4j.finders.ClassFinder;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage;
import org.docx4j.openpackaging.parts.WordprocessingML.FooterPart;
import org.docx4j.openpackaging.parts.WordprocessingML.HeaderPart;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.openpackaging.parts.relationships.Namespaces;
import org.docx4j.openpackaging.parts.relationships.RelationshipsPart;
import org.docx4j.relationships.Relationship;
import org.docx4j.wml.Body;
import org.docx4j.wml.Document;
import org.docx4j.wml.Ftr;
import org.docx4j.wml.Hdr;
import org.docx4j.wml.ObjectFactory;
import org.docx4j.wml.P;
import org.docx4j.wml.R;
import org.docx4j.wml.Tc;
import org.docx4j.wml.Text;

public class HeaderFooterGenerator {
// code...
   public void replaceAllHtml(WordprocessingMLPackage wordMLPackage, MainDocumentPart mainDocumentPart, LinkedHashMap<String, byte[]> htmlReplacingMap) throws Exception {
       Document wmlDocumentEl = (Document) mainDocumentPart.getJaxbElement();
       Body body = wmlDocumentEl.getBody();
      if (htmlReplacingMap != null) {
         XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
          for (Map.Entry<String,byte[]> entry : htmlReplacingMap.entrySet()) {
//             ByteArrayInputStream inputStream = new ByteArrayInputStream(entry.getValue());
            FileInputStream inputStream = new FileInputStream("D:\\html_example.html");
             String html_unparsed = IOUtils.toString(inputStream);
             if (!html_unparsed.subSequence(0, 5).equals("<html>")) html_unparsed = "<html>" + html_unparsed;
             if (!html_unparsed.subSequence(html_unparsed.length()-6, html_unparsed.length()).equals("</html>")) html_unparsed = html_unparsed + "</html>";
             String html = html_unparsed.replace("&nbsp;", "\u00A0");
            ClassFinder classPFinder = new ClassFinder(P.class);
            new TraversalUtil(body, classPFinder);
            for (Object p : classPFinder.results) {
               P pNode = (P) p;
               ClassFinder classRFinder = new ClassFinder(R.class);
               new TraversalUtil(pNode, classRFinder);
               for (Object r : classRFinder.results) {
                  R rNode = (R) r;
                  ClassFinder classTFinder = new ClassFinder(Text.class);
                  new TraversalUtil(rNode, classTFinder);
                  for (Object t : classTFinder.results) {
                     Text tNode = (Text) t;
                     if (tNode.getValue().equals(entry.getKey())) {
                        tNode.setValue("");
                        if (entry.getValue()!= null) {
                           List<Object> a = XHTMLImporter.convert(html, null);
                           R rOk = (R)tNode.getParent();
                            rOk.getContent().addAll(a);
                        }
                     }
                  }
               }
             }
          }
      }
   }
}


And the errors

Code: Select all
22 dic 2016 10:41:44 | WARN  | PropertyFactory               .createProperties(160) | TODO - implement for CTTblStylePr!
22 dic 2016 10:41:44 | WARN  | PropertyFactory               .createProperties(160) | TODO - implement for CTTblStylePr!
22 dic 2016 10:41:44 | WARN  | HtmlCssHelper                 .createCssForStyles(193) | ! null rPr for character style Carpredefinitoparagrafo
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: Loaded document in ~2ms
org.docx4j.org.xhtmlrenderer.load INFO:: TIME: parse stylesheets  138ms
org.docx4j.org.xhtmlrenderer.match INFO:: media = print
org.docx4j.org.xhtmlrenderer.match INFO:: Matcher created with 181 selectors
22 dic 2016 10:41:45 | WARN  | SpaceAfter                    .<init>(79) | No support for unit: CSS_EMS; instead of em, please use an absolute unit.
22 dic 2016 10:41:45 | WARN  | FontHandler                   .setRFont(91) | No mapping for: 'serif'


Any ideas?
Thank you so much!

Re: Error conerting HTML in separate thread

PostPosted: Thu Dec 22, 2016 9:28 pm
by jason
What is your input XHTML?

What output are you getting?

What is the error, specifically, of concern?

Re: Error conerting HTML in separate thread

PostPosted: Thu Dec 22, 2016 9:37 pm
by antonio9285
Thank you for your answer.
I tried to get HTML from DB and from File. The HTML is not a problem. I have a log to display HTML content and it is displayed correctly in console.

I tried with a simple HTML code:
Code: Select all
<HTML><p>HTML TEST</p></HTML>


I have to replace a TAG with a HTML content.
Code: Select all
if (tNode.getValue().equals(entry.getKey())) {
                        tNode.setValue("");
                        if (entry.getValue()!= null) {
                           List<Object> a = XHTMLImporter.convert(html, null);
                           R rOk = (R)tNode.getParent();
                            rOk.getContent().addAll(a);
                        }

The TAG is correctly replaced with "" but the insertion of the HTML contents does not take place.
The result is a blank DOCX file.

The error is that does not add the HTML content.

Re: Error conerting HTML in separate thread

PostPosted: Fri Dec 23, 2016 7:23 am
by jason
Verify your List<Object> a has content.

That will be block level content, which you should not be adding to your R rOk.

You'll need to replace the parent P with it. (In other words, you can't add it to a P either!)