Page 1 of 1

XHTMLImporterImpl and input Tag

PostPosted: Tue Mar 18, 2014 12:16 am
by willi.firulais
Hallo,

As I haven't seen any possiblity to get html:input tags converted using XHTMLImporterImpl i wanted to enhnace the mapping. But I haven't found any possiblity to do this because looking at the XHTMLImporterImpl class i haven't found any possiblity to enhance this to custom needs.

Some thoughts that come into mind while playing around with XHTMLImporterImpl. Wouldn't it be great if the if/else loop in XHTMLImporterImpl would be extensible to the developer that is using this real cool library?

What about to place the code between the if/else blocks of the traverse method in their own methods and of cource make some state attriubtes available to the inherited class. eg.
Code: Select all
else {
  org.docx4j.wml.P currentP = this.getCurrentParagraph(true);
  currentP.setPPr(this.getPPr(blockBox, cssMap));
}


changed to

Code: Select all
else {
  doDefaultConvert(box);
}


This way it would be possible to do some customization without changing your code by inhert the XHTMLImporterImpl and override the doDefaultConvert(). Maybe this first step can be done without greater refactoring.

Going a step further some refactoring of the XHTMLImporterImpl would be great using some
- Listener Patterns (Callback Functions), or
- "SAX"ish Patterns (Handlers), or
- Rule Design Patterns (Conditions/Facts and Actions).

Code: Select all
xHTMLImporterImpl.getRules().add(new GernericMatcher('ol', box), new DefaultListHandler(ContentAccessor));
xHTMLImporterImpl.getRules().add(new TableBoxMatcher(box), new DefaultTableHandler(ContentAccessor));
xHTMLImporterImpl.getRules().add(new InputItemMatcher(box), new DefaultItemHandler(ContentAccessor));


Code: Select all
private void traverse(Box box, Box parent) {
        ...
        foreach (rule in this.rules) {
            if (rule.matcher.match) {
                rule.handler.do();
            }
        }
        ...
}


Currently I use XHTMLImport to get out of emphasised markup a nice looking word document that's based on a word template.

Emphasised markup (semantic HTML) means just use h1 to h6, p, strong, emphasis, etc. but no div, no style, no JavaScript, etc.

The following workflow has helped me for preparing the word document:
- creation of xhtml with ckeditor markup
- cleaning and preparing markup with HtmlCleaner for conversion
- docx4j for importing word template
- XHTMLImport for preparation of Word ML and attaching to the loaded template
- postprocessing of Word ML with docx4j

Thx, Willia

Re: XHTMLImporterImpl and input Tag

PostPosted: Wed Mar 19, 2014 8:30 am
by jason
willi.firulais wrote:This way it would be possible to do some customization without changing your code by inhert the XHTMLImporterImpl and override the doDefaultConvert(). Maybe this first step can be done without greater refactoring.

Going a step further some refactoring of the XHTMLImporterImpl would be great using some
- Listener Patterns (Callback Functions), or
- "SAX"ish Patterns (Handlers), or
- Rule Design Patterns (Conditions/Facts and Actions).


Happy to accept contributions (under ASLv2) that add this kind of flexibility.

It'd be good to discuss any proposal a bit here first .. I'll rename this thread to "refactor for easier customization"