Page 1 of 1

How to remove space characters after paragraph?

PostPosted: Tue May 16, 2017 8:00 pm
by ludgea
Hi Jason,

I've been trying to delete theses spaces from my document for each paragraph.

I tried Option 1 without success. I tried to do a replace in my code each w:after="200" by 0 but it didn't work.

I couldn't find the xml file to modify (since I use the jar directly).

For the last option, I can't define the paragrpah added since I use a HTML automaticaly generated to a xhtml (you might have seen a recent post of something similar in an other forum).

Could you help me for the Option 2? I think it would be the easiest and best solution since I want to delete theses spaces everywhere, everytime.

Thank you

EDIT :

I managed to go through Option 2 and change the styles.xml and set w.after to 0, 1 or 20000, but it had no effect, unlike w:line that show a real change when value is modified.
I guess I'm near the solution, but I must have missed something.

Re: How to remove space after paragraph?

PostPosted: Wed May 17, 2017 7:02 pm
by jason
You'll need to post the w:pPr XML, or your docx so we can see where the spacing is coming from

Re: How to remove space after paragraph?

PostPosted: Wed May 17, 2017 7:37 pm
by ludgea
You will find attached my docx (and every space before each paragraph and title).

Do you want me to attach the style.xml too?

Re: How to remove space after paragraph?

PostPosted: Mon May 22, 2017 8:41 pm
by ludgea
I tried a few things to make it work :
- Changing the styles.xml :
Code: Select all
<w:pPrDefault>
      <w:pPr>
      <w:spacing w:after="0" w:line="276" w:lineRule="auto" />
      </w:pPr>
</w:pPrDefault>


And almost everywhere where there's a spacing, to see if it changes anything but it didn't.
But changing the value of "w:line" to something else is working so I guess my styles.xml is taken.

Here's the code used by my application :
Code: Select all
String inputfilepath = "Offers/" + param.getKey1() + "_" + param.getKey2() + "/c.xhtml";

// Create an empty docx package
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
ObjectFactory objectFactory = Context.getWmlObjectFactory();

Part stylesPart = new StyleDefinitionsPart();

((StyleDefinitionsPart) stylesPart).unmarshalDefaultStyles();

// Add the styles part to the main document part relationships
// (creating it if necessary)
wordMLPackage.getMainDocumentPart().addTargetPart(stylesPart);

NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();

XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");

wordMLPackage.getMainDocumentPart().getContent().addAll(xHTMLImporter.convert(new File(inputfilepath), null));

// Saving file
wordMLPackage.save(new java.io.File("Offers/" + param.getKey1() + "_" + param.getKey2() + "/html_output.docx"));


I'm starting to run out of ideas :?

Re: How to remove space characters after paragraph?

PostPosted: Tue May 23, 2017 9:15 pm
by jason
I've split this topic (and changed the title to include the word 'characters'), since looking at your document, your problem is completely different to the thread you replied to:- you have physical space bar characters at the start/end of your paragraph. There are not affected by w:spacing, which is spacing between paragraphs!

You need to go back to your XHTML Import, and either get rid of the spaces in the XHTML (ie do not pretty print), or instruct that that whitespace is not significant. Feel free to post a couple of paragraphs of your input XHTML.

Re: How to remove space characters after paragraph?

PostPosted: Wed May 24, 2017 8:08 pm
by ludgea
Thank you for your answer and the new thread.

Here's my processus :
I get a HTML file, then i parse it in xhtml using jTidy.
I tried all the indentation method given by Tidy but it doesn't seem to have an effect on my resulting file.

I'm wondering if the space used in the indentation isn't taken by docx4. Is is possible?

Here's a part of my xhtml :
Code: Select all
            <p class="MsoNormal">
              <b>
                <span class="OfferArial_7-5" lang="FR" style='color:blue'>1. GENERALITES</span>
              </b>
            </p>


EDIT : The indentation was the problem. The space before every paragraph on my XHTML was taken by docx4 and was added in fhe final result.

So now my xhtml looks like this :
Code: Select all
<p class="MsoNormal"><b><span class="OfferArial_7-5" lang="FR" style='color:blue'>1. GENERALITES</span></b></p>


Many thanks for your help.