Page 1 of 1

Examples/pointers towards help for reading/writing sdt

PostPosted: Fri Apr 23, 2021 12:06 am
by JonCraig
So I'm tasked with converting a program I wrote in C# - which takes a PDF file, looks for certain named fields, and either extracts their values, or writes to their values, depending on whether the user is "inputting" or "outputting."

We'd rather use Word docs for this and we're moving our entire system to Java anyway so I found docx4j ... seems that it can do what I need, but I'm not finding examples and using the online demo/doc explorer is not overly helpful. (SO. MANY. TAGS.)

Basically what I need to do is either, get the current value of a content control, by name (either w:tag or w:alias) - or set said value (also by tag/alias, of course). Are there examples out there somewhere, or a clearer documentation on how to achieve this? The huge amount of nesting and such is making it very hard to figure out how to proceed.

Here's an example field from the test document I'm playing with, as displayed by the online demo:

Online Demo wrote: <w:sdt>
<w:sdtPr>
<w:rPr>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
<w:alias w:val="jmc-enc-type-title"/>
<w:tag w:val="jmc-enc-type-tag"/>
<w:id w:val="1583019566"/>
<w:placeholder>
<w:docPart w:val="DefaultPlaceholder_1081868575"/>
</w:placeholder>
<w:showingPlcHdr/>
<w:comboBox>
<w:listItem w:value="Choose an item."/>
<w:listItem w:displayText="Evaluation" w:value="Evaluation"/>
<w:listItem w:displayText="Dispensing" w:value="Dispensing"/>
<w:listItem w:displayText="Follow Up" w:value="Follow Up"/>
</w:comboBox>
</w:sdtPr>
<w:sdtEndPr/>
<w:sdtContent>
<w:tc>
<w:tcPr>
<w:tcW w:w="2366" w:type="dxa"/>
<w:vAlign w:val="center"/>
</w:tcPr>
<w:p w14:paraId="12A46C18" w14:textId="77777777">
<w:pPr>
<w:rPr>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
</w:pPr>
<w:r>
<w:rPr>
<w:rStyle w:val="PlaceholderText"/>
</w:rPr>
<w:t>Choose an item.</w:t>
</w:r>
</w:p>
</w:tc>
</w:sdtContent>
</w:sdt>


Here the tag/alias are both set to "jmc-enc-type-title"/"jmc-enc-type-tag" (they are called title/tag in Word). I would need to either read what the combobox is currently set to, or set it to a new value. (Selected value.)

Re: Examples/pointers towards help for reading/writing sdt

PostPosted: Fri Apr 23, 2021 10:41 am
by jason
Here's some stuff I sent someone recently on getting started with OpenDoPE, which is docx4j's way of data binding content controls.

You may want to use that to avoid re-inventing the wheel. Alternatively, there's a fair bit of code in https://github.com/plutext/docx4j/tree/ ... atastorage which you can study to see how to handle content controls. At a high level you need to traverse your document, and you can do this either in Java, or via XSLT. Before the binding is done, OpenDoPEHandler manipulates the docx remove conditional content which isn't required, and duplicating repeating sdts. It recursively processes the content.

It doesn't handle your comboBox stuff; that might be useful for users interacting with the document, but is probably unnecessary if documents are generated non-interactively.

--------------


For an overview of OpenDoPE, see https://opendope.org/approach_our.html and https://opendope.org/opendope_conventions_v2.3.html

The basic runtime example (to generate an instance document) is at https://github.com/plutext/docx4j/blob/ ... sions.java

The sample document and data can be found in https://github.com/plutext/docx4j/tree/ ... atabinding

Regarding authoring tools, see https://opendope.org/implementations.html

The one you want is

1. An authoring tool aimed at less technical users is available from http://www.opendope.org/downloads/autho ... /setup.exe This tool uses the drag/drop approach Microsoft has introduced in Word 2013. The source code is available on GitHub.

Install it in Word, then use it to add an XML data part to your docx (new or existing), then you can start binding content controls.

Once you have a Word document, you can test the bindings by injecting different data (see the Java sample above).

Re: Examples/pointers towards help for reading/writing sdt

PostPosted: Sat Apr 24, 2021 2:36 am
by JonCraig
Hmmm - I suppose I should go into what this is used for. It's designed to facilitate entry of data into our system (a full medical office management package) by users in locations that can't access the full system. They make some entries into a template document and then once back at the office, the document is processed and the entries brought into the system. The "output" portion takes some data from the system and pre-fills a template document that a user then finishes.

I think data binding is overkill here, I really need just a way to get the data in these content controls into normal Java variables/objects. It seems the way word documents are set up makes this way more difficult than it needs to be. It's easy with PDFs, but we need to move to using Word docs for reasons beyond my control.

Re: Examples/pointers towards help for reading/writing sdt

PostPosted: Tue Apr 27, 2021 11:00 am
by jason
ok, so the interactive use case. Conceptually, you can do this in 2 ways.

1. use a Word document as the user interface, as you have chosen. Obviously the user needs Word (or maybe a substitute such as LibreOffice), and the interface you can offer is constrained by what is possible in Word.

or 2. use a forms based interface which does not rely on Word. In the OpenDoPE world, we generate a web-based form (an XForm) from the docx template, and then use this to collect response data (in XML format), which can then be used as-is, or applied to the docx template to create an instance document.

Given that you want to do 1 above, then I can see why you want to roll your own code to get the info out of the content controls. This isn't hard once you get your head around the idea of traversing the document. Traverse the document (recursively), and when you encounter a content control object, handle it. There is the commonly used TraversalUtil, of which https://github.com/plutext/docx4j/blob/ ... inder.java is a good example, and also a visitor version: https://github.com/plutext/docx4j/blob/ ... anish.java

Re: Examples/pointers towards help for reading/writing sdt

PostPosted: Thu Apr 29, 2021 9:33 pm
by JonCraig
jason wrote:Given that you want to do 1 above, then I can see why you want to roll your own code to get the info out of the content controls. This isn't hard once you get your head around the idea of traversing the document. Traverse the document (recursively), and when you encounter a content control object, handle it.


It's unfortunate that is has to be done this way. All of the PDF libraries I've seen (in the C# world anyway) simply have a Collection containing all of the fields as part of the object representing the PDF. I wonder why it's not the same with .docx?

Re: Examples/pointers towards help for reading/writing sdt

PostPosted: Sat May 01, 2021 7:40 am
by jason
It's unfortunate that is has to be done this way. All of the PDF libraries I've seen (in the C# world anyway) simply have a Collection containing all of the fields as part of the object representing the PDF. I wonder why it's not the same with .docx?


It would be simple to do a basic implementation of this (ie a map of sdt's by tag or alias, and a method allowing you to inject a new text value), a couple of hours work, but it wouldn't handle repeating data, removal of conditional content, insertion of images etc. Just variable replacement. It doesn't exist as a standard part of docx4j thus far because of its limited utility.