Page 1 of 1

undeleted temp files in the temp directory

PostPosted: Tue Jul 06, 2010 6:15 am
by nicoretti
While using the createImagePart method of the BinaryPartAbstractImage class a tmp file is created which is not deleted after it's usage.
Therefore the temp directory, of the user who is executing the jvm in which docx4j is used, is flooded if the method is used frequently.

- Used jar file: http://dev.plutext.org/docx4j/docx4j-nightly-20100630.jar
- Package containing the mentioned class: org.docx4j.openpackaging.parts.WordprocessingML

"Bug" - Enviroment:
- User: smokie
- OS Windows 7 x64
- Tempdir = C:\Users\smokie\AppData\Local\Temp
(Could also reproduce the bug under WIN XP / Just the temp dirs location differs)

Code: Select all
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

import org.docx4j.dml.wordprocessingDrawing.Inline;
import org.docx4j.jaxb.Context;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage;
import org.docx4j.utils.BufferUtil;
import org.docx4j.wml.Drawing;
import org.docx4j.wml.P;
import org.docx4j.wml.R;


public class Main {

   /**
    * Just a simple example which is reproducing an bug in the docx4j framework.
    */
   public static void main(String[] args) throws Exception {
      
       // Create the package
       WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
      
       // file object pointing to the picture which  i wana use
       File myImage = new File("C:\\Users\\smokie\\Pictures\\me.jpg");
       // open an input stream to get the bytes of the picture
       InputStream inputStream = new FileInputStream(myImage);
       // get the bytes of the picture
       byte[] imageBytes = BufferUtil.getBytesFromInputStream(inputStream);
       // the next line of code will produce the "error"
       // the method call will create a temp file in the users temp dir which is not removed!!
       BinaryPartAbstractImage imagePart = BinaryPartAbstractImage.createImagePart(wordMLPackage, imageBytes);
      
       // close the input stream
       inputStream.close();
      
       // put the picture into the document
       Inline inlineImage = imagePart.createImageInline("filenameHint", "altText", 1, 2, false);
       P p = Context.getWmlObjectFactory().createP();
       R r = Context.getWmlObjectFactory().createR();
       Drawing drawing = Context.getWmlObjectFactory().createDrawing();
       drawing.getAnchorOrInline().add(inlineImage);
       r.getRunContent().add(drawing);
       p.getParagraphContent().add(r);
       wordMLPackage.getMainDocumentPart().addObject(p);
      
       // Save the document
       wordMLPackage.save(new java.io.File("C:\\Users\\smokie\\DocxWithImage.docx") );
   }
}


Buggy-Code:
Code: Select all
   /**
    * Create an image part from the provided byte array, attach it to the source part
    * (eg the main document part, a header part etc), and return it.
    *
    * @param wordMLPackage
    * @param sourcePart
    * @param bytes
    * @return
    * @throws Exception
    */
   public static BinaryPartAbstractImage createImagePart(
         WordprocessingMLPackage wordMLPackage,
         Part sourcePart, byte[] bytes) throws Exception {
            
      // Whatever image type this is, we're going to need
      // to know its dimensions.
      // For that we use ImageInfo, which can only
      // load an image from a URI.
      
      // So first, write the bytes to a temp file      
      File tmpImageFile = File.createTempFile("img", ".img");
      
      FileOutputStream fos = new FileOutputStream(tmpImageFile);
      fos.write(bytes);
      fos.close();
      log.debug("created tmp file: " +  tmpImageFile.getAbsolutePath() );
            
      ImageInfo info = ensureFormatIsSupported(tmpImageFile.getAbsolutePath(), tmpImageFile, bytes);
      
      // In the absence of an exception, tmpImageFile now contains an image
      // Word will accept
      
      ContentTypeManager ctm = wordMLPackage.getContentTypeManager();
      String proposedRelId = sourcePart.getRelationshipsPart().getNextId();
      // In order to ensure unique part name,
      // idea is to use the relId, which ought to be unique
      BinaryPartAbstractImage imagePart =
         (BinaryPartAbstractImage)ctm.newPartForContentType(
            info.getMimeType(),
            IMAGE_PREFIX + proposedRelId );
            
      log.debug("created part " + imagePart.getClass().getName() +
            " with name " + imagePart.getPartName().toString() );      
      
      FileInputStream fis = new FileInputStream(tmpImageFile); //reuse      
      imagePart.setBinaryData( fis );
            
      imagePart.rel =  sourcePart.addTargetPart(imagePart, proposedRelId);
      
      imagePart.setImageInfo(info);

        // Delete the tmp file
      tmpImageFile.delete();
      
      return imagePart;
      
   }


Fix is probably (recently I have had a similar error/problem):

FileInputStream fis = new FileInputStream(tmpImageFile); //reuse
imagePart.setBinaryData( fis );
fis.close();
imagePart.rel = sourcePart.addTargetPart(imagePart, proposedRelId);
imagePart.setImageInfo(info);
// Delete the tmp file
//tmpImageFile.delete();
if (!tmpImageFile.delete()) { code which handels / reports that the file couldn't be deleted}


I would have attached a patch but I wasn't able to test the fix, because I wasn't able to build the project from source.
Thanks for developing such a nice opensource framework and I hope this post is helping to improve it.

so long
Nicola Coretti

Re: undeleted temp files in the temp directory

PostPosted: Thu Jul 08, 2010 10:13 pm
by jason
Hi Nicola

Thanks for your kind words, and for drawing this issue to my attention.

In fact imagePart.setBinaryData( fis ) already closes fis.

http://dev.plutext.org/trac/docx4j/changeset/1140 does close the files properly for me on Win 7 x64. On Win XP, they seemed to close OK for me already.

You can get it in http://dev.plutext.org/docx4j/docx4j-ni ... 100708.jar

cheers .. Jason

Re: undeleted temp files in the temp directory

PostPosted: Fri Jul 09, 2010 5:48 pm
by jason
You can use docx4j-2.4.0, which I released today. See the release sticky above for details.

Re: undeleted temp files in the temp directory

PostPosted: Sat Jul 10, 2010 2:26 am
by nicoretti
Hi jason now I get docx4j built, so next time I can attach a patch :)...
thx for the new version of the framework.
I also did some debugging it wasn't/isn't just the input stream.

After the this line of code the file can not be deleted:
ImageInfo info = ensureFormatIsSupported(tmpImageFile.getAbsolutePath(), tmpImageFile, bytes);
tmpImageFile.delete() // will return false.

The next few days I am pretty busy. When I can find some time I am maybe able to locate the exact spot which is causing the
fact that the file couldn't be deleted.

Here is what I know so far, why some files could not be deleted:

If an stream to a file is open the jvm "aquire" a lock so that other applications or code whose running in the jvm itself can't delete the file.
So if someone opens an input/output - stream to a file, the file can't be deleted till the stream object is deleted or closed...
The Catch: An local stream objects isn't deleted right after the current scope leaves the local scope... (depends on the garbage collector).

So this is probably why System.gc

fos = null;
fis = null;
System.gc();

is suggested as solution.

so long
Nico