Page 1 of 1

${substitution} with HTML

PostPosted: Sun Jan 15, 2012 9:03 am
by daceilo
Ok, hoping someone can help me out because I'm feeling brain dead. I have the need to take HTML (From a complex TextArea webform) and insert it into a docx document. I can do the following already:

1. Convert the HTML into the part, assign the part to the end of a new document using docx4j
2. I can replace the ${tags} with plain strings in a docx document

But I'm unsure of how to insert a part in the correct place... Sorry if this is a faq, I've searched all over the forums for examples of this.

Re: ${substitution} with HTML

PostPosted: Sun Jan 15, 2012 12:00 pm
by jason
I assume you're importing the HTML using a docx4j nightly, the package org.docx4j.convert.in.xhtml, and something like:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting

                WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
                wordMLPackage.getMainDocumentPart().getContent().addAll(
                                convert(f, wordMLPackage) );
 
Parsed in 0.020 seconds, using GeSHi 1.0.8.4


but that you'd instead like to attach the content at position n in some wordMLPackage2. You'd do:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting

                wordMLPackage2.getMainDocumentPart().getContent().addAll(
                                n, convert(f, wordMLPackage) );
 
Parsed in 0.013 seconds, using GeSHi 1.0.8.4


Now how you determine your position n is another question. Cut/pasted from elsewhere:

There are three approaches for finding the relevant block:
• manually
• via XPath
• via TraversalUtils

TraversalUtils is the recommended approach. This is mainly because there is a limitation to using XPath in JAXB (as to which see below).

Explanations of the three approaches follow.

Common to all of them however, is the question of how to identify what you are looking for.
• Paragraphs don't have ID's, so you might search for a particular string.
• Or you might search for the first paragraph following a section break.
• A good approach is to use content controls (which can have ID's), and to search for your content control by ID, title or tag.

Manual approach

The manual approach is to iterate through the block level elements in the document yourself, looking for the paragraph or table or content control which matches your criteria. To do this, you'd use org.docx4j.wml.Body element method:
Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
    public List<Object> getEGBlockLevelElts()
 
Parsed in 0.013 seconds, using GeSHi 1.0.8.4


XPath approach

Underlying this approach is the use of XPath to select JAXB nodes:
Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
        String xpath = "//w:p";        
        List<Object> list = documentPart.getJAXBNodesViaXPath(xpath, false);
 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4


You then find the index of the returned node in EGBlockLevelElts.

Beware, there is a limitation to using XPath in JAXB: the xpath expressions are evaluated against the XML document as it was when first opened in docx4j. You can update the associated XML document once only, by passing true into getJAXBNodesViaXPath. Updating it again (with current JAXB 2.1.x or 2.2.x) will cause an error. So you need to be a bit careful!

TraversalUtils approach

TraversalUtil is a general approach for traversing the JAXB object tree in the main document part. TraversalUtil has an interface Callback, which you use to specify how you want to traverse the nodes, and what you want to do to them.

TraversalUtil can be used to find a node; you then get the index of the returned node in EGBlockLevelElts.

Re: ${substitution} with HTML

PostPosted: Tue Jan 31, 2012 5:52 pm
by daceilo
Thanks! I definitely overcomplicated it... It was extremely easy as you pointed out as soon as I stopped thinking too hard ;)

Thanks again.

Re: ${substitution} with HTML

PostPosted: Fri Mar 23, 2012 3:55 am
by Empirica
Sorry, having the same problem here and I don't get it what do you mean by

jason wrote:You then find the index of the returned node in EGBlockLevelElts.



The Code piece in question is this one. Replacing a placeholder with text just works fine.


Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
private void replacePlaceholders() throws JAXBException {
                Properties templateProperties = new Properties();
               
                try {
                        templateProperties.load(getClass().getResourceAsStream("template.properties"));
                }catch (Exception e){
                        e.printStackTrace();
                }
                                       
                List<Object> texts = givenPackage.getMainDocumentPart()
                                .getJAXBNodesViaXPath(XPATH_TO_SELECT_TEXT_NODES, true);

//givenPackage is a WordprocessingMLPackage


                for
(Object obj : texts) {
                        Text text = (Text) ((JAXBElement) obj).getValue();
                       
                        if(isHTMLPlaceholder(text.getValue())){
                                //replace element with HTML content
                        }
                        else{
                                //replace Node with normal text
                                String textValue = replacePlaceholdersByValue(text.getValue(),templateProperties);
                                text.setValue(textValue);
                        }
                       
                }
        }      
 
Parsed in 0.015 seconds, using GeSHi 1.0.8.4


I guess I then have to do sth like that:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        private void replaceHTML(){
                givenPackage.getMainDocumentPart().getContent().addAll(INDEX, HTML_CHUNK);
        }
 
Parsed in 0.016 seconds, using GeSHi 1.0.8.4


But I still don't know how to get the correct Index and how to correctly replace the placeholder with HTML Content at the specified position.

please help!

Re: ${substitution} with HTML

PostPosted: Mon Mar 26, 2012 9:13 pm
by jason
If you have an object o, and you want to find its position in the contents, you can do:

Code: Select all
givenPackage.getMainDocumentPart().getContent().indexOf(o)

Re: ${substitution} with HTML

PostPosted: Thu Mar 29, 2012 10:05 pm
by Empirica
Thank you. For everyone else who searches a solution, here is the complete code for Text AND Html replacement using docx4j in Word documents:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
//Traces the whole document tree for each word
        private void replacePlaceholders() throws JAXBException {
                Properties templateProperties = new Properties();
                int index;
                               
                try {
                        templateProperties.load(getClass().getResourceAsStream("template.properties"));
                }catch (Exception e){
                        e.printStackTrace();
                }

                List<Object> texts = givenPackage.getMainDocumentPart()
                                .getJAXBNodesViaXPath(XPATH_TO_SELECT_TEXT_NODES, true);

                for (Object obj : texts) {
                        Text text = (Text) ((JAXBElement) obj).getValue();
                       
                        if(isHTMLPlaceholder(text.getValue())){                        
                                //replace element with HTML content    
                                logger.info("New HTML Placeholder found");
                                index = texts.indexOf(obj);                                
                                replaceHTML(index,text.getValue(),templateProperties);

                                //remove superflous placeholder
                                givenPackage.getMainDocumentPart().getContent().remove(index);

                        }
                        else{

                                //replace Node with normal text
                                if (text.getValue().startsWith("$") && text.getValue().endsWith("$")){
                                        logger.info("New TEXT Placeholder found");
                                        String textValue = replacePlaceholdersByValue(text.getValue(),templateProperties);
                                        text.setValue(textValue);
                                }
                        }

                }
        }

        private boolean isHTMLPlaceholder(String placeholderValue){
                if (placeholderValue.startsWith("$") && placeholderValue.endsWith("$")) {
                        if( placeholderValue.contains("$HTML")) return true;
                }
                return false;
        }

        //Replace Placeholders with HTML
        private void replaceHTML(int index, String placeholderValue, Properties templateProperties){

                String methodName = "";

                try {                  
                        if (templateProperties.containsKey(placeholderValue)) {
                                methodName = (String)templateProperties.get(placeholderValue);
                                logger.debug("Placeholder gefunden: " + methodName);
                                placeholderValue = (String) givenData.getClass().getMethod(methodName,null).invoke(givenData,null);
                        }

                } catch (Exception e) {
                        logger.error(e.getMessage());
                        e.printStackTrace();
                }

                String html = "<html>" + placeholderValue + "</html>";
                AlternativeFormatInputPart afiPart = null;

                //Create HTML
                try {
                        logger.info("Trying to create an html part.");
                        afiPart = new AlternativeFormatInputPart(new PartName("/hw" + String.valueOf(index) + ".html")); //CAUTION: each html part needs a new name!!
                } catch (InvalidFormatException e) {
                        e.printStackTrace();
                }

                //Parse Content
                logger.info("Get the Bytes and set the Content type of the html part.");
                afiPart.setBinaryData(html.getBytes());
                afiPart.setContentType(new ContentType("text/html"));

                Relationship altChunkRel = null;

                try {
                        logger.info("adding the Target Path...");
                        altChunkRel = givenPackage.getMainDocumentPart().addTargetPart(afiPart);
                } catch (InvalidFormatException e) {
                        e.printStackTrace();
                }

                //Add HTML to document
                logger.info("Adding HTML to the document..");
                CTAltChunk ac = Context.getWmlObjectFactory().createCTAltChunk();
                ac.setId(altChunkRel.getId() );
                givenPackage.getMainDocumentPart().getContent().add(index-1,ac);
        }

        //Replace Placeholders with text
        private String replacePlaceholdersByValue(String placeholderValue, Properties templateProperties) {
                String methodName = "";
                try {                  
                        if (templateProperties.containsKey(placeholderValue)) {
                                methodName = (String)templateProperties.get(placeholderValue);
                                logger.debug("Placeholder gefunden: " + methodName);
                                placeholderValue = (String) givenData.getClass().getMethod(methodName,null).invoke(givenData,null);
                        }
                } catch (Exception e) {
                        logger.error(e.getMessage());
                        e.printStackTrace();
                }

                return placeholderValue;
        }
 
Parsed in 0.028 seconds, using GeSHi 1.0.8.4


Hope it helps someone.

Cheers
- Empirica

Re: ${substitution} with HTML

PostPosted: Mon Apr 06, 2020 9:58 pm
by subhrajlahiri
Hi,

I have been trying this code. But I am ending up getting an error on the following line:

Code: Select all
givenPackage.getMainDocumentPart().getContent().add(index-1,ac);


The error is:
Code: Select all
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 80, Size: 11
   at java.util.ArrayList.rangeCheckForAdd(ArrayList.java:665)
   at java.util.ArrayList.add(ArrayList.java:477)
   at per.subhra.docxtemplate.App.replaceHTML(App.java:418)
   at per.subhra.docxtemplate.App.replaceHtmlPlaceHolders(App.java:136)
   at per.subhra.docxtemplate.App.main(App.java:117)


Any help is appreciated. Thanks.