Page 1 of 1

populate or retrieve word using field

PostPosted: Wed Aug 08, 2012 8:02 pm
by buptstehc
Hi, all! I am new to docx4j. I have to populate or retrieve word using field, which means there will be a template with defined fields like this:
-------------------------
<FieldBegin>


<FieldEnd>
-------------------------

1. after locating 'FieldBegin' and 'FieldEnd', I want to retrieve contents between these two fields and save as html.
2. insert html after 'FieldBegin'.
3. html may contains text, table and image.

I want to know whether docx4j can handle my situation or not. thanks!

Re: populate or retrieve word using field

PostPosted: Thu Aug 09, 2012 9:19 am
by jason
There is some machinery to manipulate fields in org.docx4j.model.fields

For example:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        public static void main(String[] args) throws Exception {
               
                WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(
                                new java.io.File(
                                                System.getProperty("user.dir") + "/src/test/resources/MERGEFIELD.docx"));
               
                List<Map<DataFieldName, String>> data = new ArrayList<Map<DataFieldName, String>>();

                Map<DataFieldName, String> map = new HashMap<DataFieldName, String>();
                map.put( new DataFieldName("KundenNAme"), "Daffy duck");
                map.put( new DataFieldName("Kundenname"), "Plutext");
                map.put(new DataFieldName("Kundenstrasse"), "Bourke Street");
               
                data.add(map);
                               
                map = new HashMap<DataFieldName, String>();
                map.put( new DataFieldName("Kundenname"), "Jason");
                map.put(new DataFieldName("Kundenstrasse"), "Collins Street");
               
                data.add(map);         
               
               
                System.out.println(XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true, true));

                WordprocessingMLPackage output = org.docx4j.model.fields.merge.MailMerger.getConsolidatedResultCrude(wordMLPackage, data);
               
               
                System.out.println(XmlUtils.marshaltoString(output.getMainDocumentPart().getJaxbElement(), true, true));
               
                output.save(new java.io.File(
                                System.getProperty("user.dir") + "/mergefield1-OUT.docx") );
               
        }
 
Parsed in 0.017 seconds, using GeSHi 1.0.8.4


There are three ways to bring in HTML:

1a. Use an altChunk, and leave it to Word to turn this into WordML
1b. Use an altChunk with XHTML content, and use docx4j to turn it into WordML (see sample AltChunkXHTMLRoundTrip)
2. Use XHTMLImporter (see samples ConvertInXHTML*)

You'd have to tie these pieces together yourself.

If you can use content controls instead of fields, you'll have less work to do as XHTML and pictures are already handled in that approach

Re: populate or retrieve word using field

PostPosted: Thu Aug 09, 2012 6:57 pm
by buptstehc
jason wrote:There is some machinery to manipulate fields in org.docx4j.model.fields

For example:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        public static void main(String[] args) throws Exception {
               
                WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(
                                new java.io.File(
                                                System.getProperty("user.dir") + "/src/test/resources/MERGEFIELD.docx"));
               
                List<Map<DataFieldName, String>> data = new ArrayList<Map<DataFieldName, String>>();

                Map<DataFieldName, String> map = new HashMap<DataFieldName, String>();
                map.put( new DataFieldName("KundenNAme"), "Daffy duck");
                map.put( new DataFieldName("Kundenname"), "Plutext");
                map.put(new DataFieldName("Kundenstrasse"), "Bourke Street");
               
                data.add(map);
                               
                map = new HashMap<DataFieldName, String>();
                map.put( new DataFieldName("Kundenname"), "Jason");
                map.put(new DataFieldName("Kundenstrasse"), "Collins Street");
               
                data.add(map);         
               
               
                System.out.println(XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true, true));

                WordprocessingMLPackage output = org.docx4j.model.fields.merge.MailMerger.getConsolidatedResultCrude(wordMLPackage, data);
               
               
                System.out.println(XmlUtils.marshaltoString(output.getMainDocumentPart().getJaxbElement(), true, true));
               
                output.save(new java.io.File(
                                System.getProperty("user.dir") + "/mergefield1-OUT.docx") );
               
        }
 
Parsed in 0.016 seconds, using GeSHi 1.0.8.4


There are three ways to bring in HTML:

1a. Use an altChunk, and leave it to Word to turn this into WordML
1b. Use an altChunk with XHTML content, and use docx4j to turn it into WordML (see sample AltChunkXHTMLRoundTrip)
2. Use XHTMLImporter (see samples ConvertInXHTML*)

You'd have to tie these pieces together yourself.

If you can use content controls instead of fields, you'll have less work to do as XHTML and pictures are already handled in that approach


Thanks jason! I have been completed the conversation from html to docx referencing 'sample/AltChunkAddOfTypeHtml.java'. However, the example about field doesn't suit me. my problem is how to locate the word randomly using field, and insert contents after that field or retrieve between the 'FieldBegin' and 'FieldEnd'. By the way, is there any javadocs for v2.8.0?

Re: populate or retrieve word using field

PostPosted: Fri Aug 10, 2012 9:15 am
by jason
jason wrote:is there any javadocs for v2.8.0?


http://search.maven.org/#artifactdetail ... .8.0%7Cjar

Re: populate or retrieve word using field

PostPosted: Sat Aug 11, 2012 7:37 pm
by buptstehc
jason wrote:
jason wrote:is there any javadocs for v2.8.0?


http://search.maven.org/#artifactdetail ... .8.0%7Cjar


Thanks again, jason! after i have read the sample, i found maybe there are two ways to navigate to a random position in the MainDocumentPart: xpath and TraversalUtil.
so my question is after i have found the inserting point(eg, using bookmark, http://www.docx4java.org/forums/docx-java-f6/can-i-read-bookmark-from-word-document-t161.html), how can i use addAltChunk() to insert my html like 'wordMLPackage.getMainDocumentPart().addAltChunk(AltChunkType.Html, html.getBytes())'?
in addition, is there any way to convert the word fragment located between two bookmarks to html ?

Re: populate or retrieve word using field

PostPosted: Sat Aug 11, 2012 7:59 pm
by jason
buptstehc wrote:retrieve between the 'FieldBegin' and 'FieldEnd


Did you see the FieldLocator class? It returns a list of paragraphs containing field begins.



buptstehc wrote:how can i use addAltChunk() to insert my html


From wml.xsd:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
        <xsd:group name="EG_BlockLevelElts">
                <xsd:annotation>
                        <xsd:appinfo>
                                <jaxb:property name="Content"/>
                        </xsd:appinfo>
                </xsd:annotation>
                <xsd:choice>
                        <xsd:group ref="EG_BlockLevelChunkElts" minOccurs="0"
                                maxOccurs="unbounded"/>
                        <xsd:element name="altChunk" type="CT_AltChunk" minOccurs="0"
                                maxOccurs="unbounded">
                                <xsd:annotation>
                                        <xsd:documentation>Anchor for Imported External
                                                Content</xsd:documentation>
                                </xsd:annotation>
                        </xsd:element>
                </xsd:choice>
        </xsd:group>
 
Parsed in 0.001 seconds, using GeSHi 1.0.8.4


it looks like altChunk is a block-level element ie you can't insert it inside a paragraph; you'll need to split your paragraph.

(It is looking much easier to use a content control based approach, as I previously suggested).

buptstehc wrote:is there any way to convert the word fragment located between two bookmarks to html ?


Please post this question in a new thread, so it is easier for other people to find via Google and benefit from.

Re: populate or retrieve word using field

PostPosted: Sun Aug 12, 2012 7:07 pm
by buptstehc
Thanks for your reply, jason. after studying the example code about content control, i still have not figure out any solution. i want to make some supplementary explanation here. my client will submit a few rich text formatted as html to server, then the server will insert these html to the predefined word template. as far as jason's suggestions above, maybe content control is the best choice. however, it seems that doc4j only bind custom xml archived in the docx file. so i want to know is it possible to replace the contents of a content control with external html file, or insert a altchunk where the content control locates and then insert html with the altchunk?

Re: populate or retrieve word using field

PostPosted: Sun Aug 12, 2012 7:22 pm
by jason
You can use (alpha quality) http://www.opendope.org/downloads/autho ... /setup.exe to bind a content control to an element containing escaped XHTML; it should put "od:ContentType=application/xhtml+xml" in the w:tag

With that, the ContentControlBindingExtensions sample ought to automatically convert that content to WordML.

Re: populate or retrieve word using field

PostPosted: Sun Aug 12, 2012 7:33 pm
by jason
buptstehc wrote: however, it seems that doc4j only bind custom xml archived in the docx file


See https://github.com/plutext/OpenDoPE-WAR for an example of merging external XML with the docx.

In the approach I am suggesting, you'd put your escaped XHTML into that external XML file.

Re: populate or retrieve word using field

PostPosted: Tue Aug 14, 2012 1:29 am
by buptstehc
jason wrote:
buptstehc wrote: however, it seems that doc4j only bind custom xml archived in the docx file


See https://github.com/plutext/OpenDoPE-WAR for an example of merging external XML with the docx.

In the approach I am suggesting, you'd put your escaped XHTML into that external XML file.


Hi,jason! i did as you said, and meet some runtime error.
1. create a content control by clicking 'Bind this text' button in opendope panel
2. set its xpath as 'journal/dj1' and xpath id as 'dj1'
3. save 'www.google.com' as pure html and transform to xhtml using jtidy
4. insert the xhtml into a xml like this:
<journal>
<dj1> xhtml</dj1>
</journal>
5. binding code:
Code: Select all
                FileInputStream xfs = new FileInputStream("resource/journal.xml");
      FileInputStream dfs = new FileInputStream("resource/dj.docx");

      LoadFromZipNG loader = new LoadFromZipNG();
      WordprocessingMLPackage wordMLPackage = (WordprocessingMLPackage) loader
            .get(dfs);

                String itemId = CustomXmlUtils.getCustomXmlItemId(wordMLPackage).toLowerCase();
                CustomXmlDataStoragePart customXmlDataStoragePart = wordMLPackage
            .getCustomXmlDataStorageParts().get(itemId);
      customXmlDataStoragePart.getData().setDocument(xfs);

                BindingHandler.applyBindings(wordMLPackage);
                wordMLPackage.save(new java.io.File("result.docx"));

6. running error occurs at 'customXmlDataStoragePart.getData().setDocument(xfs);':
Exception in thread "main" java.lang.NullPointerException
at org.docx4j.openpackaging.parts.XmlPart.setDocument(XmlPart.java:143)

I want to know is this error related to the xhtml generated by jtidy? in addition , since the html may contains table and image besides text, can text control support these elements? thanks!

Re: populate or retrieve word using field

PostPosted: Tue Aug 14, 2012 8:34 am
by jason
In your code, is customXmlDataStoragePart.getData() null, or is customXmlDataStoragePart null?

I suggest you also start with a small snippet of XHTML suitably escaped, for example "&lt;div>&lt;p>hello world&lt;/p>&lt;/div>", and get more ambitious after that.

To have images in the HTML, you'll can use base64 encoded images, or see the XHTML import code to see what else is supported. You can experiment with that directly to, for example, test table support.

Note, your custom xml file can also include images saved as base 64, not wrapped in HTML. The add-in should detect and handle the addition of such.

Re: populate or retrieve word using field

PostPosted: Tue Aug 14, 2012 1:50 pm
by buptstehc
jason wrote:In your code, is customXmlDataStoragePart.getData() null, or is customXmlDataStoragePart null?

I suggest you also start with a small snippet of XHTML suitably escaped, for example "&lt;div>&lt;p>hello world&lt;/p>&lt;/div>", and get more ambitious after that.

To have images in the HTML, you'll can use base64 encoded images, or see the XHTML import code to see what else is supported. You can experiment with that directly to, for example, test table support.

Note, your custom xml file can also include images saved as base 64, not wrapped in HTML. The add-in should detect and handle the addition of such.


thanks, i will have a try! by the way , I did not find this tag "od:ContentType=application/xhtml+xml" in the docx after inserting content control using opendope. so i want to know is this generated automatically or need to be done manually? on the other hand, i have to insert rich text including table and image in word, it seems that only 'wrap with condition' and 'wrap with repeat' suit me, and 'bind this text' only support text. however, i did not need condition and repeat.

Re: populate or retrieve word using field

PostPosted: Wed Aug 15, 2012 11:46 pm
by jason
You can use the button or drag/drop node from the task pane, or possibly by right clicking.

"od:ContentType=application/xhtml+xml" will be added to the content control's w:tag automatically, if the add in detected XHTML.

Sample data to try:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
<mydata>
  <myelement>
    &lt;div&gt;
      &lt;p&gt;Hello World!&lt;/p&gt;
    &lt;/div&gt;
  </myelement>
</mydata>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


Some additional notes:

Content controls have to be created at the block/para level or run level; this is mandated by Microsoft's OpenXML format.

You should take care to ensure that if the
XHTML is going to convert to block level content (paragraphs/tables)
then the content control it will go in is also block level (ie in Word
you have w:sdt/w:sdtContent/w:p, rather than w:p/w:sdt/w:sdtContent)

To get a block level content control, don't drag XHTML from the RHS panel ! Use the "Add XPath" button with your cursor on an empty paragraph (with no selection, or the entire paragraph including paragraph marker selected). If the cursor is in the middle of a paragraph, it'd create a run level content control.

OR (while on an empty P) on the RHS panel, right click > data value > insert content control

If your XML file contains &nbsp; which isn't an entity built in to XML
it needs to be replaced with  

Re: populate or retrieve word using field

PostPosted: Fri Aug 17, 2012 5:16 pm
by buptstehc
Thanks for jason's grateful help! i have completed the xhtml importing using content control, and summarized the process as follows, maybe helpful for someone else:
1. download and install opendope plugin for word from http://www.opendope.org/downloads/authoring-friendly/setup.exe.
2. create a new docx file, 'dj.docx', e.g.
3. in the ribbon-template manager, click 'show xml', choose 'copy the contents of an existing xml file to create', 'sample.xml', e.g. (noted that, this sample xml will be replaced later by an external file with the same schema, and the element must contain escaped xhtml which will be used by setting xpath in step 4).
4. in the ribbon-template manager, click 'add xpath', input the xpath '/journal/dj1' which we want to bind the content control, then save word and exit.
5. create an external xml file , which may come from db, 'journal.xml', e.g. run the sample code 'sample.java'.

in addition, since the steps above maybe a little geek for my clients who rarely know about programming, so the only thing i suppose the clients need to do is create the content control using word's 'development tools' in the ribbon rather than the opendope, then name these control with predefined values. after the clients upload these docx file, my program will do the following things:
1. create a custom xml part and its data storage part, referring to sample code 'ContentControlsAddCustomXmlDataStoragePart.java'.
2. create xpath binding part with predefined values
3. create an initial xml data with escaped xhtml by CustomXmlDataStorage's setDocument().
4. find the content control according to its name and set it's xpath id with values defined in step 2.

my question is how can i finish the step 2 & 4 and is there any api in docx4j? thanks again!

Re: populate or retrieve word using field

PostPosted: Sat Aug 18, 2012 12:22 am
by jason
buptstehc wrote:2. create xpath binding part with predefined values
:
4. find the content control according to its name and set it's xpath id with values defined in step 2.

my question is how can i finish the step 2 & 4 and is there any api in docx4j? thanks again!


Yes, on your approach you need to create the XPaths custom xml part.

You could do it the same way as in sample code 'ContentControlsAddCustomXmlDataStoragePart.java', but since there is org.docx4j.openpackaging.parts.opendope.XPathsPart which extends JaxbCustomXmlDataStoragePart<org.opendope.xpaths.Xpaths>, you'd be better of using the constructor there to create the part, then populate it, then do addTargetPart.

Regarding step 4, I recommend using TraversalUtil to find the content controls; have a look at OpenDoPEHandler to see how w:sdtPr elements are manipulated (there is also 2 examples in there of use of TraversalUtils to do stuff with content controls).

hope this helps .. Jason

Re: populate or retrieve word using field

PostPosted: Thu Aug 23, 2012 5:24 pm
by buptstehc
Thanks jason, it works. I have a another question about converting content control's content to html, which i will post in a new thread. :)