Page 1 of 1

MainDocumentPart getContent always returning empty list

PostPosted: Wed Jun 29, 2016 5:32 pm
by sai krishna
Hi,

I am trying to parse docx. But i am always getting the getContent() of MainDocumentPart as empty list. Following is my code.

final LoadFromZipNG loader = new LoadFromZipNG();
WordprocessingMLPackage wordMLPackage = (WordprocessingMLPackage)loader.get(new FileInputStream("/mnt/sdcard/LC1.docx"));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
List<Object> contents = documentPart.getContent();

I have tried to check my doc structure using webapp http://webapp.docx4java.org/
I could see the structure properly.
But in the program i am getting the content as empty list.
Could you please help me in knowing the exact problem and how to resolve it.
I am uploading the document i am using for parsing.

Re: MainDocumentPart getContent always returning empty list

PostPosted: Wed Jun 29, 2016 8:00 pm
by jason
Opening your docx in the current non-Android code:

Code: Select all
DEBUG org.docx4j.openpackaging.parts.JaxbXmlPart .getContents line 158 - Lazily unmarshalling /word/document.xml
DEBUG org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware .unmarshal line 429 - For org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart, unmarshall (no binder)
WARN org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 88 - [ERROR] : unexpected element (uri:"http://schemas.openxmlformats.org/markup-compatibility/2006", local:"AlternateContent"). Expect
INFO org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 134 - continuing (with possible element/attribute loss)
INFO org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware .unmarshal line 485 - encountered unexpected content in /word/document.xml; pre-processing


So the first thing to check is whether the pre-processing code is present on Android .. yes, it seems to be: https://github.com/plutext/docx4j/blob/ ... essor.xslt

Can you turn logging on? The Android version is ~2.8.0, and uses log4j. So add https://github.com/plutext/docx4j/blob/ ... /log4j.xml to your classpath

Re: MainDocumentPart getContent always returning empty list

PostPosted: Thu Jun 30, 2016 5:29 am
by sai krishna
Hi Mr.Jason,

Thanks a lot for your quick response.
I could not find a way how to configure logging using log4j.xml in android application.
But i have collected the log and attached. There are few errors.
But i could not figure out why such error occurs.
Could you please help me in resolving the issue.
Please find the log attachment.

Thanks and Regards
Sai Krishna

Re: MainDocumentPart getContent always returning empty list

PostPosted: Fri Jul 01, 2016 3:01 pm
by jason
Yours logs say:

Code: Select all
06-29 23:49:49.537  2973  2973 I System.out: 36312 [main] WARN org.docx4j.XmlUtils  - Using default SAXParserFactory: null
06-29 23:49:49.722  2973  2973 I System.out: 36490 [main] ERROR org.docx4j.XmlUtils  - Attempt to invoke virtual method 'java.lang.Object org.apache.xalan.extensions.ExtensionsTable.extFunction(org.apache.xpath.functions.FuncExtFunction, java.util.Vector, org.apache.xalan.extensions.ExpressionContext)' on a null object reference
06-29 23:49:49.722  2973  2973 I System.out: ; Line#: 54; Column#: 1
06-29 23:49:49.722  2973  2973 I System.out: javax.xml.transform.TransformerException: Attempt to invoke virtual method 'java.lang.Object org.apache.xalan.extensions.ExtensionsTable.extFunction(org.apache.xpath.functions.FuncExtFunction, java.util.Vector, org.apache.xalan.extensions.ExpressionContext)' on a null object reference
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xpath.XPath.execute(XPath.java:365)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemVariable.getValue(ElemVariable.java:274)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemVariable.execute(ElemVariable.java:245)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:370)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:175)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2225)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:113)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:370)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:175)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2225)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:113)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:370)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:175)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2225)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:113)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:370)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:175)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2225)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:113)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(ElemApplyTemplates.java:370)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTemplates.java:175)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2225)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.templates.ElemCopy.execute(ElemCopy.java:124)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(TransformerImpl.java:2225)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.applyTemplateToNode(TransformerImpl.java:2098)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.transformNode(TransformerImpl.java:1230)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:616)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1147)
06-29 23:49:49.722  2973  2973 I System.out:    at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1125)
06-29 23:49:49.722  2973  2973 I System.out:    at org.docx4j.XmlUtils.transform(XmlUtils.java:835)
06-29 23:49:49.722  2973  2973 I System.out:    at org.docx4j.XmlUtils.transform(XmlUtils.java:728)
06-29 23:49:49.722  2973  2973 I System.out:    at org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart.unmarshal(MainDocumentPart.java:323)
06-29 23:49:49.722  2973  2973 I System.out:    at org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart.unmarshal(MainDocumentPart.java:88)
06-29 23:49:49.722  2973  2973 I System.out:    at org.docx4j.openpackaging.io.LoadFromZipNG.getRawPart(LoadFromZipNG.java:556)
06-29 23:49:49.722  2973  2973 I System.out:    at org.docx4j.openpackaging.io.LoadFromZipNG.getPart(LoadFromZipNG.java:427)
06-29 23:49:49.722  2973  2973 I System.out:    at org.docx4j.openpackaging.io.LoadFromZipNG.addPartsFromRelationships(LoadFromZipNG.java:350)
06-29 23:49:49.722  2973  2973 I System.out:    at org.docx4j.openpackaging.io.LoadFromZipNG.process(LoadFromZipNG.java:243)
06-29 23:49:49.722  2973  2973 I System.out:    at org.docx4j.openpackaging.io.LoadFromZipNG.get(LoadFromZipNG.java:193)
06-29 23:49:49.722  2973  2973 I System.out:    at org.plutext.DocxToHtml.AndroidDocxToHtmlActivity.onCreate(AndroidDocxToHtmlActivity.java:54)



so there is a problem doing the pre-processing.

Perhaps because default SAXParserFactory is null? You could see whether you can configure that in your code (before doing anything docx4j related) with

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
System.setProperty("javax.xml.parsers.SAXParserFactory", SOMETHING_FOR_ANDROID);
 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4


Google 'android javax.xml.parsers.SAXParserFactory'

If that doesn't work, other options:

1. dig into the docx4j Android source code

2. make sure your input docx files don't use AlternateContent,

3. wait for a new docx4j Android release (which won't be until after 3.3.1 is released)

Re: MainDocumentPart getContent always returning empty list

PostPosted: Sat Jan 19, 2019 10:08 pm
by kinghitesh03
Have you found any solution..unable to get text from template

Code: Select all
private static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {

        List<Object> result = new ArrayList<>();
        if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();
        if (obj.getClass().equals(toSearch)) {
            result.add(obj);
        } else if (obj instanceof ContentAccessor) {
            List<?> children = ((ContentAccessor) obj).getContent();
            for (Object child : children) {
                result.addAll(getAllElementFromObject(child, toSearch));
            }

        }
        return result;
    }


this code is working when i use latest docx4j 6.1.x but this not work when i need to add image so i downgraded to docx4j2.8.0 and respective dependencies
Code: Select all
implementation files('libs/JAXBNamespacePrefixMapper-2.2.4.jar')
    implementation files('libs/activation.jar')
    implementation files('libs/additionnal.jar')
    implementation files('libs/ae-awt.jar')
    implementation files('libs/ae-docx4j-2.8.0.jar')
    implementation files('libs/ae-jaxb-2.2.5.jar')
    implementation files('libs/ae-xmlgraphics-commons.jar')
    implementation files('libs/avalon-framework-api-4.3.1.jar')
    implementation files('libs/avalon-framework-impl-4.3.1.jar')
    implementation files('libs/commons-codec-1.3.jar')
    implementation files('libs/commons-io-1.3.1.jar')
    implementation files('libs/commons-lang-2.4.jar')
    implementation files('libs/commons-logging-1.1.1.jar')
    implementation files('libs/istack-commons-runtime.jar')
    implementation files('libs/jaxp-datatype.jar')
    implementation files('libs/log4j-1.2.15.jar')
    implementation files('libs/serializer-2.7.1.jar')
    implementation files('libs/stringtemplate-3.2.1.jar')
    implementation files('libs/txw2-20110809.jar')


but unable to get content (text) from docx file