Page 1 of 1

Error unexpected element using loading xlsx files

PostPosted: Wed Jun 12, 2019 6:26 pm
by LarsD
Hi.

When I open a xlsx file i got the following stack trace.

Code: Select all
2019-06-12 09:21:56,924 INFO  [org.docx4j.jaxb.Context] (main) java.vendor=Oracle Corporation
2019-06-12 09:21:56,924 INFO  [org.docx4j.jaxb.Context] (main) java.version=1.8.0_212
2019-06-12 09:21:57,173 INFO  [org.docx4j.jaxb.Context] (main) MOXy JAXB implementation intended..
2019-06-12 09:22:02,617 INFO  [org.docx4j.jaxb.Context] (main) MOXy JAXB implementation is in use!
2019-06-12 09:22:02,932 INFO  [org.docx4j.XmlUtils] (main) setProperty com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
2019-06-12 09:22:02,933 INFO  [org.docx4j.XmlUtils] (main) actual: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
2019-06-12 09:22:02,933 INFO  [org.docx4j.XmlUtils] (main) setProperty com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
2019-06-12 09:22:02,933 INFO  [org.docx4j.XmlUtils] (main) actual: com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
2019-06-12 09:22:02,956 INFO  [org.docx4j.openpackaging.contenttype.ContentTypeManager] (main) Detected SpreadhseetMLPackage package
2019-06-12 09:22:02,961 INFO  [org.docx4j.openpackaging.contenttype.ContentTypeManager] (main) Detected SpreadhseetMLPackage package
2019-06-12 09:22:02,962 INFO  [org.docx4j.openpackaging.io3.Load3] (main) Instantiated package of type org.docx4j.openpackaging.packages.SpreadsheetMLPackage
2019-06-12 09:22:02,969 INFO  [org.docx4j.utils.XPathFactoryUtil] (main) xpath implementation: org.apache.xpath.jaxp.XPathFactoryImpl
2019-06-12 09:22:02,975 INFO  [org.docx4j.openpackaging.packages.SpreadsheetMLPackage] (main) Set shortcut for docPropsExtendedPart
2019-06-12 09:22:02,976 INFO  [org.docx4j.openpackaging.packages.SpreadsheetMLPackage] (main) Set shortcut for docPropsCorePart
2019-06-12 09:22:02,985 INFO  [org.xlsx4j.jaxb.Context] (main) java.vendor=Oracle Corporation
2019-06-12 09:22:02,985 INFO  [org.xlsx4j.jaxb.Context] (main) java.version=1.8.0_212
2019-06-12 09:22:03,040 INFO  [org.docx4j.jaxb.NamespacePrefixMapperUtils] (main) Using MOXy NamespacePrefixMapper
2019-06-12 09:22:03,045 INFO  [org.xlsx4j.jaxb.Context] (main) MOXy JAXB implementation intended..
2019-06-12 09:22:03,942 INFO  [org.xlsx4j.jaxb.Context] (main) MOXy JAXB implementation is in use!
2019-06-12 09:22:03,943 INFO  [org.docx4j.openpackaging.packages.SpreadsheetMLPackage] (main) Set shortcut for WorkbookPart
2019-06-12 09:22:03,952 INFO  [org.docx4j.openpackaging.io3.Load3] (main) package read;  elapsed time: 7069 ms
2019-06-12 09:22:03,983 WARN  [org.docx4j.jaxb.JaxbValidationEventHandler] (main) [ERROR] : unexpected element (uri:"http://schemas.openxmlformats.org/markup-compatibility/2006", local:"AlternateContent"). Expect
2019-06-12 09:22:03,984 WARN  [org.docx4j.jaxb.JaxbValidationEventHandler] (main) Column is 530 at line number 2
2019-06-12 09:22:03,986 INFO  [org.docx4j.jaxb.JaxbValidationEventHandler] (main) shouldContinue is set to false
2019-06-12 09:22:04,003 WARN  [org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware] (main)
Exception Description: An error occurred unmarshalling the document
Internal Exception: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 530; unexpected element (uri:"http://schemas.openxmlformats.org/markup-compatibility/2006", local:"AlternateContent"). Expected elements are <{http://schemas.openxmlformats.org/spreadsheetml/2006/main}fileVersion>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}fileSharing>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}workbookPr>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}workbookProtection>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}bookViews>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}sheets>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}functionGroups>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}externalReferences>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}definedNames>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}calcPr>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}oleSize>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}customWorkbookViews>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}pivotCaches>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}smartTagPr>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}smartTagTypes>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}webPublishing>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}fileRecoveryPr>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}webPublishObjects>,<{http://schemas.openxmlformats.org/spreadsheetml/2006/main}extLst>
2019-06-12 09:22:04,004 INFO  [org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware] (main) encountered unexpected content in /xl/workbook.xml; pre-processing
2019-06-12 09:22:04,016 WARN  [org.docx4j.utils.ResourceUtils] (main) Couldn't get resource: custom-preprocessor.xslt
2019-06-12 09:22:04,017 WARN  [org.docx4j.utils.ResourceUtils] (main) custom-preprocessor.xslt: custom-preprocessor.xslt not found via classloader.
2019-06-12 09:22:04,018 WARN  [org.docx4j.utils.ResourceUtils] (main) Property docx4j.jaxb.JaxbValidationEventHandler resolved to missing resource custom-preprocessor.xslt; using org/docx4j/jaxb/mc-preprocessor.xslt
2019-06-12 09:22:04,388 INFO  [org.docx4j.XmlUtils] (main) Using org.apache.xalan.transformer.TransformerImpl
2019-06-12 09:22:04,388 INFO  [org.docx4j.XmlUtils] (main) Working around https://issues.apache.org/jira/browse/XALANJ-2419
method: xml
2019-06-12 09:22:04,441 WARN  [org.docx4j.utils.XSLTUtils] (main) Found some mc:AlternateContent
2019-06-12 09:22:04,444 WARN  [org.docx4j.utils.XSLTUtils] (main) Missing mc:Fallback!  Dropping the mc:AlternateContent entirely.
2019-06-12 09:22:12,803 INFO  [org.docx4j.jaxb.NamespacePrefixMapperUtils] (main) Using MOXy NamespacePrefixMapper
2019-06-12 09:22:12,809 INFO  [org.docx4j.openpackaging.parts.DocPropsExtendedPart] (main) unmarshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
2019-06-12 09:22:12,830 INFO  [org.docx4j.openpackaging.parts.DocPropsCorePart] (main) unmarshalling org.docx4j.openpackaging.parts.DocPropsCorePart


This part:
2019-06-12 09:22:03,983 WARN [org.docx4j.jaxb.JaxbValidationEventHandler] (main) [ERROR] : unexpected element (uri:"http://schemas.openxmlformats.org/markup-compatibility/2006", local:"AlternateContent"). Expect
2019-06-12 09:22:03,984 WARN [org.docx4j.jaxb.JaxbValidationEventHandler] (main) Column is 530 at line number 2
2019-06-12 09:22:03,986 INFO [org.docx4j.jaxb.JaxbValidationEventHandler] (main) shouldContinue is set to false

looks strange to me!

I think the other wanrings after it may be a result of this problem.

I tried with different files.. always the same issue.

Btw: Column 530 does not exist in those files!!

kind regards
Lars

Re: Error unexpected element using loading xlsx files

PostPosted: Thu Jun 13, 2019 6:48 am
by jason
Could you post an xlsx exhibiting this?

The message unexpected element (uri:"http://schemas.openxmlformats.org/markup-compatibility/2006", local:"AlternateContent") means that that element was encountered, but not expected.

Microsoft uses mc:AlternateContent to extend the schemas beyond the ECMA/ISO standard, without including the mc:AlternateContent element itself in the ECMA/ISO standard.

Our strategy when this is encountered but unexpected is to run the XML through an XSLT which chooses the "fallback". But in your logs, you have: Missing mc:Fallback! Dropping the mc:AlternateContent entirely.

In recent versions of docx4j, the XSLT is invoked less often for docx and pptx, since we've added mc:AlternateContent to the content model for common cases. Same can/should be done for xlsx.

All this said, the xlsx still works (ie gracefully degrades), right? Only the unexpected mc:AlternateContent should be gone.

Btw: Column 530 does not exist in those files!!


I bet it does, if you look at it before it is pretty-printed :-)

Re: Error unexpected element using loading xlsx files

PostPosted: Thu Jun 13, 2019 8:40 pm
by LarsD
Hi.

jason wrote:All this said, the xlsx still works (ie gracefully degrades), right? Only the unexpected mc:AlternateContent should be gone.


Not really, as you may remember from our email correspondence my xlsx file is corruped after the creation process and I still can't figure out why... :cry: This is going to be a bigger problem soon because the application will be released in September. We bougth a large Plutext perpetual license for our costumer so it should work..

I hoped with fixing this problem it may work finally..

jason wrote:Btw: Column 530 does not exist in those files!!


Yeah.. I think column in this case is just the position in the output, not column of the xlsx file..

I added two files.


kind regards

Lars

Re: Error unexpected element using loading xlsx files

PostPosted: Tue Jun 18, 2019 9:18 am
by jason
Hi Lars

I had assumed your xlsx problem had gone away. Let's go back to email and get it sorted out ASAP.

kind regards .. Jason

Re: Error unexpected element using loading xlsx files

PostPosted: Sun Jun 23, 2019 11:43 am
by jason
The main problem with this xlsx is that you have string values in the cell value element.

For example, at the bottom of sheet1:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
         <ns0:c r="BN282" s="20" t="n" cm="0" vm="0">
            <ns0:v>KZ-III 5 - 4173/6t-6t-74t 8/2019</ns0:v>
         </ns0:c>
 
Parsed in 0.001 seconds, using GeSHi 1.0.8.4


These should instead to inline strings, or entries in the shared strings table.

When Excel 2016 fixes your xlsx, it puts it in the shared strings table:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
      <c r="BN282" s="20" t="s">
        <v>612</v>
      </c>
 
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


I wrote some code which remediates by converting to an inline string:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
            <c r="BN282" s="20" t="inlineStr" cm="0" vm="0">
                <is>
                    <r>
                        <t>KZ-III 5 - 4173/6t-6t-74t 8/2019</t>
                    </r>
                </is>
            </c>
 
Parsed in 0.001 seconds, using GeSHi 1.0.8.4


After running it, Excel 2016 opens it without complaint. Here is the code:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.SpreadsheetMLPackage;
import org.docx4j.openpackaging.parts.SpreadsheetML.WorkbookPart;
import org.docx4j.openpackaging.parts.SpreadsheetML.WorksheetPart;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.xlsx4j.sml.CTRElt;
import org.xlsx4j.sml.CTRst;
import org.xlsx4j.sml.CTXstringWhitespace;
import org.xlsx4j.sml.Cell;
import org.xlsx4j.sml.Row;
import org.xlsx4j.sml.STCellType;
import org.xlsx4j.sml.SheetData;
import org.xlsx4j.sml.Worksheet;


/**
 * For now, all this does is represent inline strings correctly
 *
 * @author jharrop
 */

public class CellContentRemediator {
       
        private static Logger log = LoggerFactory.getLogger(CellContentExtractor.class);                                               

        public static void main(String[] args) throws Exception {

                String inputfilepath = System.getProperty("user.dir") + "/mat.xlsx";
                       
                // Open a document from the file system
                SpreadsheetMLPackage xlsxPkg = SpreadsheetMLPackage.load(new java.io.File(inputfilepath));             
                               
                WorkbookPart workbookPart = xlsxPkg.getWorkbookPart();
                WorksheetPart sheet = workbookPart.getWorksheet(0);
               
                // Now fix strings
                remediateStrings(sheet);
               
                xlsxPkg.save((new java.io.File(System.getProperty("user.dir") + "/fixed.xlsx")));      
        }
       
       
        private static void remediateStrings(WorksheetPart sheet) throws Docx4JException {

                Worksheet ws = sheet.getContents();
                SheetData data = ws.getSheetData();
               
                for (Row r : data.getRow() ) {
                        System.out.println("row " + r.getR() );                
                       
                        for (Cell c : r.getC() ) {
                               
                                String cellValue = c.getV();
                               
                                try {
                                        if (cellValue!=null) {
                                                float f = Float.parseFloat(cellValue);
                                        }
                                } catch (NumberFormatException nf) {
                                        // Its not a number; convert to INLINE_STR
                                       
                                        CTXstringWhitespace t = new CTXstringWhitespace();
                                        t.setValue(cellValue);
                                        CTRElt rElt = new CTRElt();
                                        rElt.setT(t);
                                       
                                        CTRst rst = new CTRst();
                                        rst.getR().add(rElt);
                                       
                                        c.setIs(rst);
                                       
                                        c.setT(STCellType.INLINE_STR);
                                        c.setV(null);
                                       
                                        System.out.println("fixed " + c.getR());
                                       
                                }
                }
                }
        }
       
       
       
}
 
Parsed in 0.019 seconds, using GeSHi 1.0.8.4