Page 1 of 1

Check styles in docx using docx4j.

PostPosted: Mon Apr 11, 2016 12:54 am
by Jarey
I'm trying to read the full content of a docx using docx4j and check if all the properties applied to that content are the ones i want it to be.

Especifically i want to check that all the text in the docx is written in Arial, size 12, with double spacing between lines. The style can come applied directly on the run, on the paragraph, as default styling or from custom styles created in the docx.

From what i read the way to do it in docx4j is the following (correct me if i am wrong please):

    Read the document from origin:
Code: Select all
WordprocessingMLPackage wordMLPackage = null;
wordMLPackage = WordprocessingMLPackage.load(new File(docPath));

    Iterate over it's paragraphs or runs:
    (paragraphs):
Code: Select all
//Paragraphs
PropertyResolver propertyResolver = new PropertyResolver(wordMLPackage);
final String XPATH_TO_SELECT_TEXT_NODES = "//w:p";
List<Object> jaxbNodes = null;
   
jaxbNodes = documentPart.getJAXBNodesViaXPath(XPATH_TO_SELECT_TEXT_NODES, true);
int i=1;
//Iterate over each paragraph.
for (Object jaxbNode : jaxbNodes){
      final String paragraphString = jaxbNode.toString();
               System.out.println("[Start]: " + paragraphString);
               P  paragraph = ((P)XmlUtils.unwrap(jaxbNode) );

               PPr paragraphProperties = paragraph.getPPr();
              
               if (paragraphProperties != null && paragraphProperties.getPStyle() != null) {
                   String style = paragraphProperties.getPStyle().getVal();
                   if(style!=null){
                      System.out.println("The style of the paragraph "+i+" is: "+style);
                   }
               }
               //Obtain effective properties
               PPr estiloPPr = propertyResolver.getEffectivePPr(paragraphProperties);
       }
}

    (runs):
Code: Select all
//Runs
        final String XPATH_TO_SELECT_RUN_NODES = "//w:r";
      List<Object> jaxbRunNodes = null;
      wordMLPackage = WordprocessingMLPackage.load(new File(docPath));
      propertyResolver = new PropertyResolver(wordMLPackage);
      documentPart = wordMLPackage.getMainDocumentPart();
      jaxbRunNodes = documentPart.getJAXBNodesViaXPath(XPATH_TO_SELECT_RUN_NODES, true);
      for (Object jaxbRun : jaxbRunNodes){
         
         String runString = jaxbRun.toString();
         R run = ((R)XmlUtils.unwrap(jaxbRun));
                  
         if(run!=null){
            RPr runProps = run.getRPr();
            if(runProps!=null){
               log.info("Run fonts"+ runProps.getRFonts());;
               log.info("Font size"+runProps.getSz());
               log.info("Run style: "+runProps.getRStyle());
               log.info("Run spacing: "+runProps.getSpacing());
            }else{
               log.info("Run doesnt have its own styling.");
            }
         }
      }


Even i've check the "Getting Started" guide, and made a little research, analysing how the export mechanism works in order to obtain the styling properties for each text when exporting to pdf or html, i'm finding it quite difficult to achive my styling checks.

I've read about using PropertyResolver class in oder to obtain the real styling applied to certain run, but I dont really understand 100% how to use it, because i can't see full information of styling when debbuging the above code snippets.
I've read too the Traversing a docx example (OpenMainDocumentAndTraverse.java) but i'm not able to determine a way of doing what i want as i've already said.

Any advice is really appreciated, even if it means solving the problem in another way. (i'm a noob using docx4j so feel free to advice in any way you consider a better solution).

Thanks in advance.

Re: Check styles in docx using docx4j.

PostPosted: Mon Apr 11, 2016 11:47 pm
by jason
For PPr, suggest you look at https://github.com/plutext/docx4j-expor ... .java#L263

For PRr, https://github.com/plutext/docx4j-expor ... .java#L597

XPath is ok, but I'd probably use traverse.

I might knock something together tomorrow...

Re: Check styles in docx using docx4j.

PostPosted: Tue Apr 12, 2016 4:04 am
by Jarey
jason wrote:For PPr, suggest you look at https://github.com/plutext/docx4j-expor ... .java#L263

For PRr, https://github.com/plutext/docx4j-expor ... .java#L597

XPath is ok, but I'd probably use traverse.

I might knock something together tomorrow...


Firstable, thank you very much for your answer Jason. I'm taking a look at the resources you pointed out to see if I see how to advance with the the problem solving.

Any suggestion you can come up with, would be very usefull.

Thanks again.

Re: Check styles in docx using docx4j.

PostPosted: Tue Apr 12, 2016 5:31 pm
by jason
You could start with something like:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
import java.util.List;
import java.util.Map.Entry;

import org.docx4j.Docx4J;
import org.docx4j.TraversalUtil;
import org.docx4j.TraversalUtil.CallbackImpl;
import org.docx4j.XmlUtils;
import org.docx4j.model.PropertyResolver;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.exceptions.InvalidFormatException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.Part;
import org.docx4j.openpackaging.parts.PartName;
import org.docx4j.openpackaging.parts.WordprocessingML.FooterPart;
import org.docx4j.openpackaging.parts.WordprocessingML.HeaderPart;
import org.docx4j.wml.P;
import org.docx4j.wml.PPr;
import org.docx4j.wml.R;
import org.docx4j.wml.RPr;



public class DisplayProperties {

        public DisplayProperties(WordprocessingMLPackage pkg) {
                this.pkg = pkg;
        }
       
        WordprocessingMLPackage pkg;
       
        public static void main(String[] args) throws Docx4JException {
               
        String inputfilepath = System.getProperty("user.dir") + "/sample-docx.docx";
               
                WordprocessingMLPackage pkg = Docx4J.load(new java.io.File(inputfilepath));
               
                DisplayProperties dp = new DisplayProperties( pkg);
                dp.applyCallbackToParts();

        }
       
    private void applyCallbackToParts()
                throws InvalidFormatException {

                FormattingLister formattingLister = new FormattingLister();  
                formattingLister.propertyResolver = pkg.getMainDocumentPart().getPropertyResolver();
       
        if (pkg.getMainDocumentPart().getStyleDefinitionsPart() == null)
        {
                System.out.println("no styles part!");
                return;
        }
       
            // Apply map to MDP                
                new TraversalUtil(pkg.getMainDocumentPart().getJaxbElement(), formattingLister);
       
            // Apply map to headers/footers
                for (Entry<PartName, Part> entry : pkg.getParts().getParts().entrySet()) {

                        Part p = entry.getValue();

                        if (p instanceof HeaderPart) {
                        new TraversalUtil(((HeaderPart)p).getJaxbElement().getEGBlockLevelElts(), formattingLister);                                                           
                        }

                        if (p instanceof FooterPart) {
                        new TraversalUtil(((FooterPart)p).getJaxbElement().getEGBlockLevelElts(), formattingLister);                                                           
                        }
                       
                }
       
            // Could also do endnotes/footnotes
        // and Comments
               
                return;

    }  

    public static class FormattingLister extends CallbackImpl {
       
        PropertyResolver propertyResolver;
       
        PPr pPrDirect;
       
        @Override
                public List<Object> apply(Object o) {
               
                       
                        if (o instanceof P) {
                               
                                P p = (P)o;
                                pPrDirect = propertyResolver.getEffectivePPr(p.getPPr());
                               
                                if (pPrDirect.getSpacing()!=null) {
                                        System.out.println(XmlUtils.marshaltoString(pPrDirect.getSpacing()));
                                }
                               
                        }

                        if (o instanceof R) {
                               
                                R r = (R)o;
                                RPr rPr = propertyResolver.getEffectiveRPr(r.getRPr(), pPrDirect);     
                                System.out.println(XmlUtils.marshaltoString(rPr));
                        }
                       
                        return null;
                }
       
       
       
        }
       
       
       
}

 
Parsed in 0.021 seconds, using GeSHi 1.0.8.4

Re: Check styles in docx using docx4j.

PostPosted: Thu Apr 14, 2016 6:27 am
by Jarey
Thank you very much Jason. I'll see if I can elaborate throught your code and achive my goal of style checking.

Kind regards.