Plutext

Posted: **Tue Mar 11, 2014 9:05 pm**

Hallo,

Do you have some hints what properties (or something else) to set to get the headings transformed?

XHTMLImporter is really a great Approach for getting .docx documents out of .xhtml Markup.

I've played around with XHTMLImporter and noted that the tags h1, h2, h3 are rendered as bold text in the word document.
But I would have expected:

h1 .. heading1
h2 .. heading2
h3 .. heading3

Thx in advance for any hint,
Willi

P.S. With this line of code the Heading1 is working in general for the transformed word document

wordMLPackage.getMainDocumentPart().addStyledParagraphOfText("Heading1", "As is Heading1");

Posted: **Thu Mar 13, 2014 8:27 pm**

willi.firulais wrote: I would have expected:

h1 .. heading1
h2 .. heading2
h3 .. heading3

I'll look to add an option to do this in the next week or so.

Posted: **Thu Mar 13, 2014 11:32 pm**

jason wrote:I'll look to add an option to do this in the next week or so.

From your Statement I assume that there is no mapping currently.
But it's really great to here that such a Feature will be shortly in XHTMLImporter.

Thx a lot,
Willi

Posted: **Fri Mar 14, 2014 9:02 pm**

As a simple Workaround I've used HtmlCleaner to add a css class to the h1 tag so XHTMLImporter can match the class with the Word WL Style Name.

The disadvantage of this Workaround is that all css style Information that is inherited from eg. html page is given to the paragraph. If a style is defined in word for headings (e.g. because loading a word template before tansformation) it would be great that as an enhancement request - the word style superseeds the css style.

Code: Select all: <h1>My Chapter</h1>

get's transformed to

Code: Select all: <h1 class="Heading1">My Chapter</h1>

h1.class=Heading1
or
h1.class=berschrift1 .. note that the heading in german word is named "beschrift1"

Code: Select all: HtmlCleaner cleaner = new HtmlCleaner(); CleanerProperties props = cleaner.getProperties(); CleanerTransformations transformations = new CleanerTransformations(); TagTransformation tt = null; tt = new TagTransformation("h1", "h1", true); tt.addAttributeTransformation("class", "Heading1"); transformations.addTransformation(tt); props.setCleanerTransformations(transformations); TagNode tagNode = cleaner.clean(xhtml);

Posted: **Fri Mar 14, 2014 10:25 pm**

There is:

Syntax: [ Download ] [ Hide ]

Using java Syntax Highlighting

/**

 * CLASS_TO_STYLE_ONLY: a Word style matching a class attribute will

 * be used, and nothing else

 * 

 * CLASS_PLUS_OTHER: a Word style matching a class attribute will

 * be used; other css will be translated to direct formatting

 * 

 * IGNORE_CLASS: css will be translated to direct formatting

 *

 */
publicenum FormattingOption {

        CLASS_TO_STYLE_ONLY, CLASS_PLUS_OTHER, IGNORE_CLASS;
}
Parsed in 0.014 seconds,  using GeSHi 1.0.8.4

In XHTMLImporterImpl, there is setParagraphFormatting. The default is CLASS_PLUS_OTHER, but it sounds like you want CLASS_TO_STYLE_ONLY (in which case you can disregard the below)

CLASS_PLUS_OTHER

If you were to use CLASS_PLUS_OTHER, it can be useful to have CSS on your HTML which matches your target docx. This prevents unwanted default CSS values having effect.

You can use HtmlCssHelper.createCssForStyles to generate that.

For the StyleTree arg, you can do:

StyleTree styleTree = wordMLPackage.getMainDocumentPart().getStyleTree();

Note that the styles which Word shows in its user interface aren't necessarily defined in the styles part of the docx. Typically, Word only writes an actual definition in the styles part if the style is actually being used in the document.

Of the styles which are actually defined, docx4j typically builds a StyleTree from that subset which are actually used somewhere in the document:

Syntax: [ Download ] [ Hide ]

Using java Syntax Highlighting

/**

         * Build a StyleTree for stylesInUse. 

         * 

         * @param stylesInUse styles actually in use in the main document part, headers/footers, footnotes/endnotes 

         * @param allStyles styles defined in the style definitions part

         */
public StyleTree(Set<String> stylesInUse, Map<String, Style> allStyles)
Parsed in 0.013 seconds,  using GeSHi 1.0.8.4

Your first step then is to ensure the styles your are interested in are actually defined in styles.xml.

After that, you could define a Set<String> stylesInUse which specifies all defined styles (ie the keys in Map<String, Style> allStyles) and use that to construct StyleTree.

For XHTML import purposes I guess it could be useful to add a constructor:

public StyleTree(Map<String, Style> allStyles)

Posted: **Mon Mar 17, 2014 11:08 am**

willi.firulais wrote:As a simple Workaround I've used HtmlCleaner to add a css class to the h1 tag so XHTMLImporter can match the class with the Word WL Style Name.

I'm considering something similar .. a mapping of element names to Word styles.

In the CLASS_TO_STYLE_ONLY and CLASS_PLUS_OTHER cases, the mapping would be used only if there was no class val (or there was no Word style having name = class val ?). ie @class trumps element name

In the IGNORE_CLASS case, the mapping of element names to Word styles could always be used. If you don't want that, just make the map empty.

willi.firulais wrote:The disadvantage of this Workaround is that all css style Information that is inherited from eg. html page is given to the paragraph. If a style is defined in word for headings (e.g. because loading a word template before tansformation) it would be great that as an enhancement request - the word style superseeds the css style.

Since the approach I describe above would happen after XHTML renderer has parsed the xhtml + css, it wouldn't alter the css computed by XHTML renderer.

Posted: **Mon Mar 17, 2014 11:25 pm**

Hallo,

It's great to here that from you. It sounds realy great that there will be a mapping of element names to Word styles.

The CLASS_TO_STYLE_ONLY, CLASS_PLUS_OTHER, IGNORE_CLASS should be settable per mapping entry.
This mapping should only be used if someone want to customize the default behaviour (as you have described).

eg. (some kind of pseudo JSON, to express what i think of):

{
mapping: [
{class: "Standard", style: "Standard", FormattingOption: CLASS_PLUS_OTHER},
{class: "Head1", style: "berschrift1", FormattingOption: CLASS_TO_STYLE_ONLY}
]
}

Thx, Willi

Posted: **Wed Mar 19, 2014 8:25 am**

Hi, what's the rationale / use case for making FormattingOption settable per mapping entry?

Certainly it can be done, but unless there's a good reason, it may be better to keep it simple...

At present, FormattingOption can be set indepedently for paragraph, run and table level styles.

Posted: **Mon Aug 04, 2014 8:07 pm**

There is support for mapping eg h1 to "Heading 1" style, in the 3.2.0 beta.

To enable it, you'll need a properties file with the content:

https://github.com/plutext/docx4j-Impor ... properties

Posted: **Fri Jun 09, 2017 12:48 am**

Hi,

I tried this feature, but didn't work for me.

I tested with this simple xhtml:

Code: Select all: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Heading</title> </head> <body> <h1>level 1</h1> </body> </html>

and with this content in docx4j-ImportXHTML.properties:

Code: Select all: docx4j-ImportXHTML.Element.Heading.MapToStyle=true

The result was still not having the heading style:

Code: Select all: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:ns10="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart" xmlns:ns13="http://schemas.openxmlformats.org/drawingml/2006/chartDrawing" xmlns:dgm="http://schemas.openxmlformats.org/drawingml/2006/diagram" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing" xmlns:dsp="http://schemas.microsoft.com/office/drawing/2008/diagram" xmlns:ns18="urn:schemas-microsoft-com:office:excel" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:ns22="urn:schemas-microsoft-com:office:powerpoint" xmlns:ns24="http://schemas.microsoft.com/office/2006/coverPageProps" xmlns:odx="http://opendope.org/xpaths" xmlns:odc="http://opendope.org/conditions" xmlns:odq="http://opendope.org/questions" xmlns:oda="http://opendope.org/answers" xmlns:odi="http://opendope.org/components" xmlns:odgm="http://opendope.org/SmartArt/DataHierarchy" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns:ns32="http://schemas.openxmlformats.org/drawingml/2006/compatibility" xmlns:ns33="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas"> <w:body> <w:p> <w:pPr> <w:spacing w:after="0"/> <w:ind w:left="0"/> <w:jc w:val="left"/> </w:pPr> <w:r> <w:rPr> <w:rFonts w:hAnsi="Times New Roman" w:ascii="Times New Roman"/> <w:b/> <w:i w:val="false"/> <w:color w:val="000000"/> </w:rPr> <w:t>level 1</w:t> </w:r> </w:p> <w:sectPr> <w:headerReference w:type="default" r:id="rId4"/> <w:footerReference w:type="default" r:id="rId5"/> <w:pgSz w:code="9" w:h="16839" w:w="11907"/> <w:pgMar w:left="1440" w:bottom="1440" w:right="1440" w:top="1440"/> </w:sectPr> </w:body> </w:document>

After some investigation I found in XHTMLImporterImpl in isHeading() and handleHeadingElement(), it is using the getLocalName() which returns null. With getTagName() it returns the tag name.

Is getLocalName() ok here? Should not be getTagName() instead?

Thanks,
László

Plutext

XHTMLImporter and headings

XHTMLImporter and headings

Re: XHTMLImporter and headings

Re: XHTMLImporter and headings

Re: XHTMLImporter and headings

Re: XHTMLImporter and headings

Re: XHTMLImporter and headings

Re: XHTMLImporter and headings

Re: XHTMLImporter and headings

Re: XHTMLImporter and headings

Re: XHTMLImporter and headings