Page 1 of 1

Docx to html and back to docx

PostPosted: Fri Jun 13, 2014 9:42 pm
by barnaba_hunters
Hi all,
I have a problem with docx4j - using ConvertOutHtml example I'm converting from docx to html and then using
Code: Select all
      
                wordMLPackage
            .getMainDocumentPart()
            .getContent()
            .addAll(xHTMLImporter.convert(
                  new File(inputfilepath + ".html"), null));

      wordMLPackage.save(new java.io.File(outputfilepath));
      System.out.println("Saved: " + outputfilepath);

back to docx. My problem is that output file has only Calibri font, no headers and no footers. Fonts and footers appear in html file without problems. Could anyone help?
Thanks,
Mateusz

Re: Docx to html and back to docx

PostPosted: Sat Jun 14, 2014 6:51 pm
by jason
Re fonts, see https://github.com/plutext/docx4j-Impor ... rImpl.java at line 236:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
    /**
         * Map a font family, for example "Century Gothic" in:
         *
         *    font-family:"Century Gothic", Helvetica, Arial, sans-serif;
         *
         * to a w:rFonts object, for example:
         *
         *    <w:rFonts w:ascii="Arial Black" w:hAnsi="Arial Black"/>
         *
         * Assuming style font-family:"Century Gothic", Helvetica, Arial, sans-serif;
         * the first font family for which there is a mapping is the one
         * which will be used.
         *
         * xhtml-renderer's CSSName defaults font-family: serif
         *
         * It is your responsibility to ensure a suitable font is available
         * on the target system (or embedded in the docx package).  If we
         * (eventually) support CSS @font-face, docx4j could do that
         * for you (at least for font formats we can convert to something
         * embeddable).
         *
         * @since 3.0
         */

        public static void addFontMapping(String cssFontFamily, RFonts rFonts) {
                fontFamilyToFont.put(cssFontFamily, rFonts);
        }
 
Parsed in 0.013 seconds, using GeSHi 1.0.8.4


Re HTML/CSS headers/footers to docx, iirc currently there is no explicit support for this.

Re: Docx to html and back to docx

PostPosted: Thu Jul 24, 2014 3:24 am
by barnaba_hunters
Sorry but this is still unclear for me. So there is mapping method addFontMapping(..) but still I need rfont objects containing Arial and Times New Roman - my question is how do I create them? Is this configured in xml files?
Thanks for your help.

Re: Docx to html and back to docx

PostPosted: Thu Jul 24, 2014 11:19 pm
by barnaba_hunters
Ok, I found example on how to register font:
Code: Select all

       // Setup font mapping
      RFonts rfonts = Context.getWmlObjectFactory().createRFonts();
      rfonts.setAscii("Century Gothic");
        XHTMLImporterImpl.addFontMapping("Century Gothic", rfonts);

EDIT:

Ok, now it works - turned out that I had
Code: Select all
xHTMLImporter.setParagraphFormatting(FormattingOption.IGNORE_CLASS);

plus some unnecessary lines. So the code in the end looks like this:


Code: Select all
         String regex = null;
         // Windows:
         // String
         // regex=".*(calibri|camb|cour|arial|symb|times|Times|zapf).*";
         regex = ".*(arial|times).*";
         // Mac
         // String
         // regex=".*(Courier New|Arial|Times New Roman|Comic Sans|Georgia|Impact|Lucida Console|Lucida Sans Unicode|Palatino Linotype|Tahoma|Trebuchet|Verdana|Symbol|Webdings|Wingdings|MS Sans Serif|MS Serif).*";
         PhysicalFonts.setRegex(regex);

         // Document loading (required)
         WordprocessingMLPackage wordMLPackage;
         System.out.println("Loading file from " + inputfilepath);
         // wordMLPackage = Docx4J.load(new java.io.File(inputfilepath));

         HTMLSettings htmlSettings = Docx4J.createHTMLSettings();

         htmlSettings.setImageDirPath(inputfilepath + "_files");
         htmlSettings.setImageTargetUri(inputfilepath
               .substring(inputfilepath.lastIndexOf("/") + 1) + "_files");

         String userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img,  ol, ul, li, table, caption, tbody, tfoot, thead, tr, th, td "
               + "{ margin: 0; padding: 0; border: 0;}"
               + "body {line-height: 1;} ";
         htmlSettings.setUserCSS(userCSS);

         wordMLPackage = WordprocessingMLPackage.createPackage();
         RFonts arialRFonts = Context.getWmlObjectFactory().createRFonts();
         arialRFonts.setAscii("Arial");
         arialRFonts.setHint(org.docx4j.wml.STHint.DEFAULT);
         arialRFonts.setHAnsi("Arial");
         XHTMLImporterImpl.addFontMapping("Arial", arialRFonts);
         RFonts timesRFonts = Context.getWmlObjectFactory().createRFonts();
         timesRFonts.setAscii("Times");
         timesRFonts.setHint(org.docx4j.wml.STHint.DEFAULT);
         timesRFonts.setHAnsi("Times");
         XHTMLImporterImpl.addFontMapping("Times New Roman", timesRFonts);
         XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(
               wordMLPackage);
         xHTMLImporter.setHyperlinkStyle("Hyperlink");
         // xHTMLImporter.setParagraphFormatting(FormattingOption.IGNORE_CLASS);
         wordMLPackage.getDocumentModel().getSections().get(0)
               .getPageDimensions().setPgSize(PageSizePaper.A4, true);
         wordMLPackage.getDocumentModel().getSections().get(0)
               .getPageDimensions().setMargins(MarginsWellKnown.NARROW);
         wordMLPackage.getMainDocumentPart().getContent()
               .addAll(xHTMLImporter.convert(tempHtmlFile, null));

         wordMLPackage.save(tempDocxFile);
         System.out.println("Saved: " + "report.docx");

Re: Docx to html and back to docx

PostPosted: Thu Jul 31, 2014 10:33 am
by jason
With https://github.com/plutext/docx4j-Impor ... 392acccbde
common cases ought to be handled automagically in 3.2.0