Page 1 of 1

Upgrade from 2.6.0 to 2.8.1 corruption problem

PostPosted: Thu Nov 01, 2012 3:06 am
by arimmer
I'm trying to upgrade the version of the docx4j used from 2.6.0 to 2.8.1 but have encountered a problem.

When I just change the jar file, everything compiles and runs okay, but when I try to open the resulting docx files in Word 2007 it complains that the files are corrupt. The exact message is:
The file part1.docx cannot be opened because there are problems with the contents.
Details: The file is corrupt and cannot be opened.


If I okay this message, say I trust the source and say no to searching for something on the MS website to open it with, it then opens apparently okay.

Has anyone seen anything like this before and if so could they point me towards what might be the problem?

The files we're producing include tables and figures (which reference .png files) - has anything to do with processing these changed?

I'd really appreciate any help or pointers as to where the problem might be. We really need to upgrade so we can use bookmarks for hyperlinks, but we can't afford to break what is currently working.

[Running one of the sample programs (the bookmarks example) using 2.8.1 produces a docx file which opens fine. ]

Re: Upgrade from 2.6.0 to 2.8.1 corruption problem

PostPosted: Thu Nov 01, 2012 7:18 am
by jason
If you could post one of the problematic docx files (as short as possible pls), and a high level description of what your code does, I'll take a look at the file to see what Word might not like.

Does you code operate on some input docx? If so, have you verified that Word is happy with the input docx?

Re: Upgrade from 2.6.0 to 2.8.1 corruption problem

PostPosted: Fri Nov 02, 2012 9:50 pm
by arimmer
In trying to produce a small example docx file with the problem, I have discovered that only some of our docx output has this problem.

At a high level, what we're doing for the problem files is producing a report containing text which originally exists as HTML, but we programatically convert to docx format using docx4j. Some of the report content is also directly generated by the user/application. The styles.xml and numbering.xml are files which already exist and are 'copied' into the docx file.

I will continue looking into this to try and discover what we are doing differently for the 'problem' files as compared to the others.

In the mean time I have attached a small file which shows the problem, in the hopes that you may be able to spot the cause of problem.

(I edited the text of a couple of the constituent xml files directly to remove sensitive data [renamed docx to zip, edited files, updated archive then renamed to docx]. The file remained 'broken', making similar changes to a docx file which does not exhibit the problem did not cause any problems with that file, it still opened fine in Word 2007)

Re: Upgrade from 2.6.0 to 2.8.1 corruption problem

PostPosted: Fri Nov 02, 2012 10:23 pm
by jason
The problem is the contents of the core properties part (core.xml), specifically:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
  <w:simpleLiteral xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xml:lang="Some Text Here"/>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


and

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
  <dc:creator xml:lang=""/>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


Remove those and it opens OK in Word 2010

Re: Upgrade from 2.6.0 to 2.8.1 corruption problem

PostPosted: Fri Nov 02, 2012 11:13 pm
by arimmer
Thanks.

We're trying to use these to store the creator and title of the report using the following code:
Code: Select all
DocPropsCorePart docPropsCorePart = new DocPropsCorePart();
      CoreProperties coreProperties = docPropsObjectFactory.createCoreProperties();
      docPropsCorePart.setJaxbElement( coreProperties );
     //---- the next section causes word 2007 to report as corrupt with docx4j 2.8.1
      SimpleLiteral titleLiteral = new SimpleLiteral( );
      if (title != null)
      {
        titleLiteral.setLang( title );
      }
      coreProperties.setTitle( DOCXHelper.getWrappedSimpleLiteral( titleLiteral ));

      SimpleLiteral creator = new SimpleLiteral();
      if (author != null)
      {
        creator.setLang( author );
      }
      coreProperties.setCreator( creator );
      //---- the above section causes word 2007 to report as corrupt with docx4j 2.8.1
       if (author != null)
      {
        coreProperties.setLastModifiedBy( author );
      }

      coreProperties.setRevision( "1" );

      return docPropsCorePart;


I don't know why the lang attribute is being used for the value, as the developer did not document the choice and is not available to ask.

Changing the code to put the value in the content solves the problem for creator but not for title.

The extra code called when setting the title is :
Code: Select all
public static JAXBElement<SimpleLiteral> getWrappedSimpleLiteral( SimpleLiteral simpleLiteral )
  {
    return new JAXBElement<SimpleLiteral>( new QName( Namespaces.NS_WORD12, "simpleLiteral"), SimpleLiteral.class, simpleLiteral );
  }


How should we be setting the title?

Re: Upgrade from 2.6.0 to 2.8.1 corruption problem

PostPosted: Sat Nov 03, 2012 9:13 pm
by jason
Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
                org.docx4j.docProps.core.dc.elements.ObjectFactory of = new org.docx4j.docProps.core.dc.elements.ObjectFactory();
                SimpleLiteral literal = of.createSimpleLiteral();
                literal.getContent().add("some title");
                core.getJaxbElement().setTitle(of.createTitle(literal) );              
 
Parsed in 0.015 seconds, using GeSHi 1.0.8.4

Re: Upgrade from 2.6.0 to 2.8.1 corruption problem

PostPosted: Sun Nov 04, 2012 12:48 pm
by jason
OpcPackage now contains:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        /**
         * @since 2.8.2
         */
   
        public void setTitle(String title)
 
Parsed in 0.013 seconds, using GeSHi 1.0.8.4

Re: Upgrade from 2.6.0 to 2.8.1 corruption problem

PostPosted: Mon Nov 05, 2012 9:07 pm
by arimmer
Thank you for your help - that works fine.