Page 1 of 1

Adding document parts to new document

PostPosted: Mon Oct 26, 2009 7:53 am
by jbeltran
I made the post below a month or two ago and I'm finally getting around to working on the actual problem

viewtopic.php?f=6&t=184

Just to reiterate, I need to add random objects (i.e OLEObjects, images, etc) from one document into a new document. Based on the post above, I just need to 1) get the XML from the original document 2) get the actual object 3) update the relationship ID to the object.

For the post part the steps make sense, but I had a couple followup questions:

1) What's the best way to actually get the object from the document after loading the document using WordprocessingMLPackage.load() and add that object to a new WordprocessingMLPackage that represents the new document.
2) Once I've retrieved the file, I'm assuming I'll need to create a "Part" and add that part using addTargetPart(), but what type of Part would it be? A BinaryPart? I'm assuming I'll need to convert the file object from the original document to an InputStream since that's what a BinaryPart needs.
3) Any code samples that do something the steps listed above?

Any info would be greatly appreciated :)

Thanks,
Justin

Re: Adding document parts to new document

PostPosted: Mon Oct 26, 2009 8:46 am
by jason
jbeltran wrote:1) What's the best way to actually get the object from the document after loading the document using WordprocessingMLPackage.load() and add that object to a new WordprocessingMLPackage that represents the new document.


To copy the object in the main document part, Object copy = XmlUtils.deepCopy ought to do it. Then on the target main document part, do addObject(copy).

Remember that you'll need to adjust any relId the copied object points to, so that it points (via the rel which will get added in 2 below) to the correct object.

jbeltran wrote:2) Once I've retrieved the file, I'm assuming I'll need to create a "Part" and add that part using addTargetPart(), but what type of Part would it be? A BinaryPart? I'm assuming I'll need to convert the file object from the original document to an InputStream since that's what a BinaryPart needs.
3) Any code samples that do something the steps listed above?


See samples/ImportForeignPart

You can modify that, so that it exposes the Relationship which addTargetPart returns; you'll need that for (1).

Re: Adding document parts to new document

PostPosted: Mon Oct 26, 2009 9:18 am
by jbeltran
Jason,

Thanks for the quick response. Will the addObject method do all the work necessary to actually move the actual binary file to the new package? For example say an object is referring to "Image1.emf" is the old document. Say I do the deepCopy and add that object to the new document using addObject, will it also copy over "Image1.emf" to the new document?

Also, I'm not exactly sure how I would go about creating that new "Part". Looking at the code sample you referernced, I'm not exactly sure what I would pass in.

Part foreignPart = Load.getRawPart(is, foreignCtm, resolvedPartUri);

I'm assuming the inputStream needs to be the from the actual file object (i.e. Image1.emf). Am I correct or am I totally off?

Justin

Re: Adding document parts to new document

PostPosted: Mon Oct 26, 2009 10:07 am
by jason
jbeltran wrote:Will the addObject method do all the work necessary to actually move the actual binary file to the new package?


No, all it does is add something (eg table, p) to the main document part.

Any files you want to add (because they are needed by the thing you added to the main document part), you have to add as new parts.

jbeltran wrote:Also, I'm not exactly sure how I would go about creating that new "Part".


If the foreignPart is sitting in an existing docx, you can just open the docx (as as wordMLPackage1), and get a reference to the part you want to copy (part1).

Then create or open your target wordMLPackage2, and add part1 to it. You don't even need to copy it (provided you don't modify it and then want to save wordMLPackage1 again with the unmodified part1).

Re: Adding document parts to new document

PostPosted: Mon Oct 26, 2009 9:40 pm
by jbeltran
Jason,

I got it to work using the snippet below, but I hvae some follow up questions:

Code: Select all

//Get "Drawing" object
Drawing drawing =  getDrawing(oldPkg);

//Get relationship ID from object
String partId = extractDrawingRelationshipId(drawing);
      
//Get part from old package
Part part = oldPkg.getMainDocumentPart().getRelationshipsPart().getPart(partId);
      
//Add target part to new document
Relationship rel = newPkg.getMainDocumentPart().addTargetPart(part);
      
//Add actual drawing object to new document, replacing old relationship ID first
newPkg.getMainDocumentPart().addObject(replaceDrawingRelationship(rel.getId(), drawing));



Is there a better way to do this? Right now I'm using regex's to find and replace the old relationship IDs with the new ones, but it seems somewhat excessive :) Any info would be greatly appreciated.

Thanks!
Justin

Re: Adding document parts to new document

PostPosted: Mon Oct 26, 2009 11:28 pm
by jason
jbeltran wrote:Is there a better way to do this? Right now I'm using regex's to find and replace the old relationship IDs with the new ones, but it seems somewhat excessive Any info would be greatly appreciated.


Well, for each type of object, the relId is in a known spot in the xml, so you can navigate the object hierarchy to it. That would avoid converting to text and back again (although that is pretty much what deepCopy does anyway, so if performance is critical you could try marshalling the object to string, apply your regex, then unmarshall ie avoid the deepCopy).

Re: Adding document parts to new document

PostPosted: Tue Oct 27, 2009 9:01 am
by jbeltran
Hi Jason,

Another follow up question. I'm getting a NullPointerException with the following snippet when it gets to the addTargetPart method.

Code: Select all
String relId = extractRelationshipId(o);
Part part = oldPkg.getMainDocumentPart().getRelationshipsPart().getPart(relId);
Relationship rel = newPkg.getMainDocumentPart().addTargetPart(part);


I can retreive the part with no issue. The part itself is actually of type BinaryPart (it's actually referring to /word/embeddings/Microsoft_Office_Excel_97-2003_Worksheet1.xls). The issue is when it gets to addTargetPart, I get the following NullPointerException:

Code: Select all
java.lang.NullPointerException
   at org.docx4j.openpackaging.parts.relationships.RelationshipsPart.addPart(RelationshipsPart.java:424)
   at org.docx4j.openpackaging.Base.addTargetPart(Base.java:186)
...


The issue it appears to be because in RelationshipsPart line 424, the part is trying to reference getContentType(), but in this particular case, the BinaryPart that I retrieved has contentType set to null hence the NullPointerException.

I'm I using the correct steps? Is there something I'm missing. Should contentType have been populated?

Thanks,
Justin

Re: Adding document parts to new document

PostPosted: Tue Oct 27, 2009 3:31 pm
by jason
jbeltran wrote:but in this particular case, the BinaryPart that I retrieved has contentType set to null hence the NullPointerException.


Thanks for this report. Fixed in revision 954.

You might also want to do:

Code: Select all
      part.setOwningRelationshipPart(rel);
      part.setPackage(newPkg);

Re: Adding document parts to new document

PostPosted: Wed Oct 28, 2009 10:57 am
by jbeltran
Thanks to your help Jason I almost have adding these objects working. Right now I'm running into a couple issues and they're tied to the names of the embedded files (which I guess are determined by the PartName class).

If I'm copying parts from only one WordprocessingMLPackage to a new WordprocessingMLPackage, I can copy over the embedded files without any issue. If I rename the docx file to zip, the Excel files get added with out any problems. However, say I'm copying parts from MULTIPLE files to only one new WordprocessingMLPackage, the document blows up when I open it. I believe the issue is do to the fact that the other MULTIPLE files have embedded objects that have the same name. For example, in one package, I had a "Microsoft_Office_Excel_97-2003_Worksheet1.xls" but I had the exact same file in another package.

That being said, I figured I'd change the PartName's to point to new names that I create (i.e. 1.xls, 2.xls, etc) whenever I add that part to the new WordMLPackage. That way even if 2 packages have the same names for embedded files, they would have different names in the new package. That being said, I used the following code:

Code: Select all
WordprocessingMLPackage newPkg = cvp.getPackage(sanitized);
MainDocumentPart newPkgMainDocPart = newPkg.getMainDocumentPart();
      
//get relationship id from Object o
String oldRelId = extractRelationshipId(o);
      
//get part based on relationship Id
Part oldPart = oldPkg.getMainDocumentPart().getRelationshipsPart().getPart(oldRelId);

//if old part is not null, create new part in cost volume pkg
if( oldPart != null ){                  
   //System.out.println("OLD PART NAME BEFORE UPDATE:  " + oldPart.getPartName());
         
   //add part and get relationship to new part
   oldPart.setPackage(newPkg);
   oldPart.setOwningRelationshipPart(newPkgMainDocPart.getRelationshipsPart());
   Relationship newRel = cvp.getPackage(sanitized).getMainDocumentPart().addTargetPart(oldPart);
         
            //update part name to generated specific name
            //NOTE,  cvp.getNextExternalObjectPrefixId() returns an new unique ID that I use to generate a unique file name every time I add a part
         
   Part newPart = newPkgMainDocPart.getRelationshipsPart().getPart(newRel.getId());      
   String oldName = newPart.partName.getName();
   String fileExt = ECVTFileUtils.getFilenameExtension(oldName);
   String newName = oldName.replace(fileExt, "_EXT_OBJ_" + cvp.getNextExternalObjectPrefixId() + fileExt);
   newPart.partName = new PartName(newName);  //not sure why PartName doesn't have a setter for partName
         
         
             System.out.println("OLD PART NAME AFTER UPDATE:  " + oldPart.getPartName());
   System.out.println("NEW PART NAME AFTER UPDATE:  " + newPart.getPartName());
         
   //update relationship ID reference to use new ID
   return updateRelationshipId(newRel.getId(), o, oldPkg, cvp, sanitized);
} else {
  null
}      


So now I'm running into 2 issues:

1) The file names in document.xml.rels still reflect the old part names. The files in the actual zip have the new names, but the actual file names in the Relationships tags in document.xml.rels don't change. For example, say I rename a PartName from "AAA.xls" to "BBB.xls". The actual file in the zip will be "BBB.xls", but document.xml.rels will still show "AAA.xls"

2) Is it possible to create a Part based of an existing Part (essentially a deep copy)? In the code above, the name of the old part in the old document and the new part in the new document end up having the same name and that's causing problems for me.

Thanks!
Justin

Re: Adding document parts to new document

PostPosted: Wed Oct 28, 2009 1:06 pm
by jason
You could use a (currently non-existent) method Part.setPartName which you'd call on oldPart before you do addTargetPart.

For your purposes, all it would need to do is set the part name on the part.

But to maintain the integrity of the existing document, it needs to do 2 more things:

1. get its owning relationship (the rel pointing to it), and setTarget

2. in the package's Parts collection, remove the existing part, and put it again under its new name.

This should work, even if the part has things it is pointing to (ie its own RelationshipsPart)

Given what you've been doing over the last few days, I think you should find this straightforward. If you'd like to contribute it back, I'll add it to the Part class.

cheers .. Jason

Re: Adding document parts to new document

PostPosted: Wed Oct 28, 2009 4:51 pm
by jbeltran
Hi Jason,

One follow up question. When you say in the "package's Parts collection, remove the existing part, and put it again under its new name", do you mean remove it from the old documents package or the new document pkg? (i.e. if the old doc is A, and im "moving" it's part to the new doc B, you're saying remove it from A or B). Would the ode would be something like this:?

//add part first time...
part.setPackage(newPkg);
part.setOwningRelationshipPart(newPkgMainDocPart.getRelationshipsPart());
Relationship newRel = cvp.getPackage(sanitized).getMainDocumentPart().addTargetPart(part);

//remove part...
newPkg.getMainDocumentPart().getRelationhshipsPart().removePart(part)

//add it back again after changing name
part.setPartName(new PartName("new part name"));
newRel = cvp.getPackage(sanitized).getMainDocumentPart().addTargetPart(part);

//add target to relationship (I thought addTargetPart would do this?)
newRel.setTarget(part)

Is that any where close to what you meant? By removing the part and adding it back again, will that update the references in the document.xmls.rels? What class actually represents that "document.xml.rels" file?

I'm pretty confused still :( You've been very helpful and I almost think I got it but more clarification would be greatly appreciated.

Re: Adding document parts to new document

PostPosted: Thu Oct 29, 2009 1:47 am
by jason
See http://dev.plutext.org/trac/docx4j/changeset/962, which adds the setPartName method.
There is also a link there to CopyPart, which contains the example:

Code: Select all
p.setPartName( new PartName("/word/embeddings/MySpreadsheet.xlsx") );


In that example, I rename the part in the source package, and only after doing that do I add it to the target package.
Pretty straightforward (provided it works!! :) )