Page 1 of 1

Is there a function to Validate .docx files are valid?

PostPosted: Sat Dec 18, 2010 2:17 am
by DashV
Does docx4j have a function I can call to validate a docx document with respect to the Specification/Schema? Does this happen automatically when WordprocessingMLPackage.load() is called? How descriptive is Docx4JException? Will I get a detailed description of what's wrong?

I want to use docx4j to process and modify docx documents created with MS Office as well as other 3rd party programs and want to make sure first and foremost that what I'm about to process/modify is good to begin with. If it's not good I want to have some way to indicate where/why it's broken so I can report it to the user.

I read the javadoc and checked the forums and didn't find an answer.

Looks like a fantastic project.

Re: Is there a function to Validate .docx files are valid?

PostPosted: Mon Dec 20, 2010 4:09 pm
by jason
You need to worry about validity in the XML sense, and also semantic constraints (described only in the text of the spec, or in the Microsoft Office implementations).

docx4j will tell you when it encounters an unexpected element. This shouldn't happen if you have created the docx in Word 2007. (Certain Word 2010 elements will be dropped, which is what Word 2007 does with them as well).

org/docx4j/jaxb/JaxbValidationEventHandler is typically set as the event handler on the Unmarshaller. You can change this if you wish.

There is a class org.docx4j.jaxb.WmlSchema which isn't currently used, which you could use as the basis for validating the your main document part after marshalling it.

You'd be validating against xsd/wml/wml.xsd and the various schemas it imports. If you do modify this to validate, please let us know how it goes.

If you are creating exotic structures with docx4j and find that Word can't open them, that can be a bit of a pain, since Word (2007 at least) gives very little feedback as to what it finds objectionable. Validation might help you to see what the problem is, but I typically run up against the extra constraints instead.

cheers .. Jason