Page 1 of 1

How to extract text from a .docx document to java program

PostPosted: Fri Jun 26, 2015 10:07 am
by Gorkuf
Hey guys I'm a newbie using docx4j and xml so I was wondering about how to extract text from a .docx document to a java text area.
I mean I can load it with the file chooser but then I don't know how to extract the paragraphs of text.
Note>It's text only, no images, no special styles, no tables
Proof of what I'm saying is true and am not lazy just really need your help because it's urgent>

public static void main(String[] args) throws FileNotFoundException, Docx4JException {
WordprocessingMLPackage wordMLPackage;
JFileChooser window = new JFileChooser();
int returnValue = window.showOpenDialog(null);
if(returnValue==JFileChooser.APPROVE_OPTION){
wordMLPackage = WordprocessingMLPackage.load(new FileInputStream(window.getSelectedFile()));

}
}

Re: How to extract text from a .docx document to java progra

PostPosted: Tue Jul 21, 2015 2:45 pm
by jason
Google 'docx4j textutils'

Re: How to extract text from a .docx document to java progra

PostPosted: Fri Sep 18, 2015 2:23 am
by michaelgeorge
I am excepting this from a block of my code that does text replacements. While my replacement code works, I haven't tested this excerpt. Try it, hopefully it either works for you or it gives you a hint as to where you should go from here. Good luck!

Code: Select all
   public static void main(String[] args) throws Exception {
      String filename = "yourdocumentname.docx";
      org.docx4j.openpackaging.packages.WordprocessingMLPackagewordMLPackage = WordprocessingMLPackage.load(new File(fileName));
      List<Object> texts = getAllElementFromObject(wordMLPackage.getMainDocumentPart(), org.docx4j.wml.Text.class);
      for (Object t : texts) {
         org.docx4j.wml.Text content = (org.docx4j.wml.Text) t;
         System.out.println(content.getValue());
      }
   }