Page 1 of 1

Save style when converting docx->html->docx

PostPosted: Thu Nov 15, 2018 8:35 pm
by scread
Hi there,
I want to thank you first for great library!

We want to use docx4j and docx4j-ImportXHTML for converting docx->html->docx. When a document has html format, it's can be edited by users.
Inspired by this example (https://github.com/plutext/docx4j-Impor ... dBack.java), tried to test the result I could get
and unfortunately I faced with an issue when converting back to docx from html - some styles converted wrongly.
I think my issue is only with "Spacing" (it's so called in Word). Each text has param After = 11.35 pt, and line spacing = 1.15

I managed to overcome it only by passing all styles from original docx document to the new one, like this:
Code: Select all
// fetching styles from original document
StyleDefinitionsPart styles = docxIn.getMainDocumentPart().getStyleDefinitionsPart();

// passing them them to new
docxOut.getMainDocumentPart().setPartShortcut(styles, Namespaces.STYLES);


In my opinion, this is a good way to do it...
I wonder what the best way to do it? Did I miss something?

Current version 6.0.1.

Code I'm using:
Code: Select all
public void fromDocxToHtmlAndBack() throws Exception {
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("path_to_docx"));

        HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
        htmlSettings.setWmlPackage(wordMLPackage);
        htmlSettings.setImageDirPath("java.io.tmpdir");
        htmlSettings.setImageTargetUri("java.io.tmpdir");

        String htmlFilePath = "path_to_converted_html";
        OutputStream os = new java.io.FileOutputStream(htmlFilePath);

        // write html
        Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

        // XHTML to docx
        File xmlFile = new File(htmlFilePath);

        WordprocessingMLPackage docxOut = WordprocessingMLPackage.createPackage();

        NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
        docxOut.getMainDocumentPart().addTargetPart(ndp);
        ndp.unmarshalDefaultNumbering();

        XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(docxOut);
        XHTMLImporter.setHyperlinkStyle("Hyperlink");

        docxOut.getMainDocumentPart().getContent().addAll(
                XHTMLImporter.convert(xmlFile, null));

        Docx4J.save(docxOut, new File("path_to_converted_docx"));
    }


Thank you in advance

UPD:
Font size works almost OK in 3.3.7 version.

Except a few places where text has font-size = 11px after convert to docx
Example:
Code: Select all
<style>
/* PARAGRAPH STYLES */
.DocDefaults {display:block;border-top-style: none;border-bottom-style: none;border-left-style: none;border-right-style: none;margin-top: 0in;margin-bottom: 0in;line-height: 100%;color: #000000;font-size: 10.0pt;}
.Normal1 {display:block;}

/* CHARACTER STYLES */ span.DefaultParagraphFont {display:inline;}
</style>

<p class="Normal1 DocDefaults "><span class="DefaultParagraphFont " style="font-family: 'Times New Roman';white-space:pre-wrap;">Mobile and Web Application development, Ruby, Rails, </span><span class="DefaultParagraphFont " style="font-family: 'Times New Roman';">Javascript</span><span class="DefaultParagraphFont " style="font-family: 'Times New Roman';white-space:pre-wrap;">, ReactJS, React-Native, Swift, Meteor, Cordova, Node.js, MongoDB, Amazon Web Services, Git, Agile processes and methodology, </span><span class="DefaultParagraphFont " style="font-family: 'Times New Roman';">RSpec, SQL, TDD, BDD</span></p>