Page 1 of 1

Recognizing Headings

PostPosted: Wed Dec 05, 2012 7:25 pm
by zkoppanylist
Hi Everybody,

how can I recognize whether a paragraph is a Heading1-....?

Right now I use the code below however this is not reliable. I have for example a German docx document where heading is reported as "berschrift" however in English Libreoffice when I navigate to this paragraph I do see "Heading 1".

Regards,

Zsolt

private final static Pattern heading = Pattern.compile("(berschrift|style|heading)\\s*([1-9])", Pattern.CASE_INSENSITIVE);
....

// Check for heading.
PPr ppr = paragraph.getPPr();
if (ppr != null && ppr.getPStyle() != null && ppr.getPStyle().getVal() != null) {
PStyle pStyle = ppr.getPStyle();
String style = pStyle.getVal();

Matcher m = heading.matcher(style);
boolean styleMatches = m.matches();

Re: Recognizing Headings

PostPosted: Thu Dec 06, 2012 7:38 am
by jason
There is a page on another site somewhere which identifies differences in locales, but I can't find it right now... it would be good if docx4j knew the names of headings at least in common languages.

A style is usually based on another style (for example, Heading 1 on Normal). So a user can also make a style MyH based on Heading 1. A thorough heading recognizer would optionally identify this case.

The alternative approach is to rely on OutlineLvl:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
        protected static int getHeadingLevel(Style s) {
               
                if (USE_OUTLINE_LVL) {
                        return (getOutlineLvl(s)+1);
                } else {
                        return getLvlFromStyleId(s);
                }
               
        }
       
        private static int getOutlineLvl(Style s) {
                // Heading 1 is lvl 0
                // There are 9 levels, so 9 will be lvl 8
                // So return 9 for normal text
                if (s==null
                                || s.getPPr()==null) return 9;
               
                OutlineLvl outlineLvl = s.getPPr().getOutlineLvl();
                if (outlineLvl==null) return 9;
                return outlineLvl.getVal().intValue();
        }
       
        private static int getLvlFromStyleId(Style s) {
                if (s==null) return 10;
                String id = s.getStyleId();
               
                if (id.startsWith("Heading")) {
                        String suffix = id.substring(7);
                        int level = 10;
                        try {
                                return Integer.parseInt(suffix);
                        } catch (NumberFormatException nfe) {
                                return 10;
                        }
                } else {
                        return 10;
                }
               
        }

 
Parsed in 0.016 seconds, using GeSHi 1.0.8.4