Page 1 of 1

Editing Charts and Spreadsheets with pptx4j

PostPosted: Wed Oct 31, 2012 3:49 am
by jeff
Hello,

I'm working on a project that will take an existing PowerPoint presentation with a number of charts embedded in it, update the data then output a new copy. I've found a few hints in the forum about how this should work but missing is an example for saving the binary data back into the EmbeddedPackagePart. All of the examples I've seen on the forum appear to read in existing spreadsheets from a discreet disk file.

Here's how I'm updating Cell values (a naive example):

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
String xlsPartName = "/ppt/embeddings/Microsoft_Excel_Sheet1.xlsx";
               
EmbeddedPackagePart epp  = (EmbeddedPackagePart) ppt
        .getParts().get(new PartName(xlsPartName));
               
InputStream is = BufferUtil.newInputStream(epp.getBuffer());
               
SpreadsheetMLPackage pkg = (SpreadsheetMLPackage) SpreadsheetMLPackage.load(is);
               
Map<PartName,Part> partsMap = pkg.getParts().getParts();                 
Iterator<Entry<PartName, Part>> it = partsMap.entrySet().iterator();
while(it.hasNext()) {
        Map.Entry<PartName, Part> pairs = it.next();
        logger.info(" Key: " + pairs.getKey() + " value: " + partsMap.get(pairs.getKey()));
        if (partsMap.get(pairs.getKey()) instanceof WorksheetPart) {
                WorksheetPart wsp = (WorksheetPart) partsMap.get(pairs.getKey()) ;
                SheetData sheetData = wsp.getJaxbElement().getSheetData();
                List<Row> rows = sheetData.getRow();
               
               for (Row row : rows)
                {
                        List<Cell> cells = row.getC();
                        for (Cell cell : cells)
                        {
                                if (cell.getR().equals("B2") && cell.getV() != null) {
                                        logger.info("B2 CELL VAL: " + cell.getV());
                                        // change the B2 cell value
                                        cell.setV("14.03");
                                }
                                else if (cell.getR().equals("B3") && cell.getV() != null) {
                                        logger.info("B3 CELL VAL: " + cell.getV());
                                        // Change the B3 cell value
                                        cell.setV("22.12");
                                }
                        }                                      
                }
                // TODO Is this necessary?
                spreadSheet.addTargetPart(wsp, AddPartBehaviour.OVERWRITE_IF_NAME_EXISTS);
        }
}

// TODO ... update binary data
// TODO ... update chart
// TODO ... update slide
 
Parsed in 0.018 seconds, using GeSHi 1.0.8.4


Of the 3 steps at the end of the snippet above the only one I haven't been able to figure out is transforming the updated spreadsheet to binary data.
I tried simply saving the modified spreadsheet to disk then reading it in as a FileInputStream but that threw this Exception:

Code: Select all
[com.sun.istack.internal.SAXException2: unable to marshal type "org.xlsx4j.sml.CTTable" as an element because it is missing an @XmlRootElement annotation]
   at com.sun.xml.internal.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:317)
   at com.sun.xml.internal.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:243)
   at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:75)
   at org.docx4j.openpackaging.parts.JaxbXmlPart.marshal(JaxbXmlPart.java:197)
   at org.docx4j.openpackaging.parts.JaxbXmlPart.marshal(JaxbXmlPart.java:175)
   at org.docx4j.openpackaging.io.SaveToZipFile.saveRawXmlPart(SaveToZipFile.java:246)
   at org.docx4j.openpackaging.io.SaveToZipFile.saveRawXmlPart(SaveToZipFile.java:195)
   at org.docx4j.openpackaging.io.SaveToZipFile.savePart(SaveToZipFile.java:419)
   at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:382)
   at org.docx4j.openpackaging.io.SaveToZipFile.savePart(SaveToZipFile.java:437)
   at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:382)
   at org.docx4j.openpackaging.io.SaveToZipFile.savePart(SaveToZipFile.java:437)
   at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:382)
   at org.docx4j.openpackaging.io.SaveToZipFile.save(SaveToZipFile.java:165)
   at org.docx4j.openpackaging.io.SaveToZipFile.save(SaveToZipFile.java:94)
   at com.cogine.test.ppt.PptTestMain.updateChart(PptTestMain.java:166)
   at com.cogine.test.ppt.PptTestMain.main(PptTestMain.java:89)
Caused by: com.sun.istack.internal.SAXException2: unable to marshal type "org.xlsx4j.sml.CTTable" as an element because it is missing an @XmlRootElement annotation
   at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.reportError(XMLSerializer.java:216)
   at com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl.serializeRoot(ClassBeanInfoImpl.java:286)
   at com.sun.xml.internal.bind.v2.runtime.XMLSerializer.childAsRoot(XMLSerializer.java:462)
   at com.sun.xml.internal.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:314)
   ... 16 more
org.docx4j.openpackaging.exceptions.Docx4JException: Failed to add parts from relationships
   at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:387)
   at org.docx4j.openpackaging.io.SaveToZipFile.save(SaveToZipFile.java:165)
   at org.docx4j.openpackaging.io.SaveToZipFile.save(SaveToZipFile.java:94)
   at com.cogine.test.ppt.PptTestMain.updateChart(PptTestMain.java:166)
   at com.cogine.test.ppt.PptTestMain.main(PptTestMain.java:89)
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Failed to add parts from relationships
   at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:387)
   at org.docx4j.openpackaging.io.SaveToZipFile.savePart(SaveToZipFile.java:437)
   at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:382)
   ... 4 more
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Failed to add parts from relationships
   at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:387)
   at org.docx4j.openpackaging.io.SaveToZipFile.savePart(SaveToZipFile.java:437)
   at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:382)
   ... 6 more
Caused by: org.docx4j.openpackaging.exceptions.Docx4JException: Problem saving part xl/tables/table1.xml
   at org.docx4j.openpackaging.io.SaveToZipFile.saveRawXmlPart(SaveToZipFile.java:304)
   at org.docx4j.openpackaging.io.SaveToZipFile.saveRawXmlPart(SaveToZipFile.java:195)
   at org.docx4j.openpackaging.io.SaveToZipFile.savePart(SaveToZipFile.java:419)
   at org.docx4j.openpackaging.io.SaveToZipFile.addPartsFromRelationships(SaveToZipFile.java:382)
   ... 8 more



Here a link to another topic that deals with the issue but leaves out how to convert a SpreadsheetMLPackage into binary format.

http://www.docx4java.org/forums/docx-java-f6/how-to-edit-an-embedded-xlsx-values-programatically-t1042.html#p3521

Note that when I have a solution I will happily contribute an example since this use case appears to pop up regularly.

Thanks,

Jeff

Re: Editing Charts and Spreadsheets with pptx4j

PostPosted: Wed Oct 31, 2012 8:34 am
by jason
Hi Jeff, just a quick comment for now .. you'd ordinarily work around the absence of an @XmlRootElement element annotation on "org.xlsx4j.sml.CTTable" by using Marshaller.marshal signature which allows you to specify element name.

But from your stack trace it is SaveToZipFile which is having this problem, which I haven't seen before.

I wonder what part xl/tables/table1.xml looks like?

If it is in your incoming pptx, and docx4j is able to open/save (ie round trip that), it is possible that you're messing that part up somehow.

If what you're doing to it should be OK, then it looks it no problem to add the @XmlRootElement annotation to "org.xlsx4j.sml.CTTable". (ie add it in the docx4j source code, and re-compile)

It should be OK to do that, since in sml-table.xsd,

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
  <xsd:element name="table" type="CT_Table">
    <xsd:annotation>
      <xsd:documentation>Table</xsd:documentation>
    </xsd:annotation>
  </xsd:element>

  <xsd:complexType name="CT_Table">
:
 
Parsed in 0.001 seconds, using GeSHi 1.0.8.4


and no other elements are of type CT_Table.

Re: Editing Charts and Spreadsheets with pptx4j

PostPosted: Wed Oct 31, 2012 9:36 am
by jeff
Hi Jason,

Thanks for the quick follow up!

If you could tell me what the standard way to convert updated spreadsheet to the binary format needed for EmbeddedPackagePart.setBinaryData() that would help solve the issue.

In the meantime, I'll take a look at the annotation issue in the exception to see if i can sort that out. But if there's a API to easily convert from the SpreadsheetMLPackage object to binary in memory that's really what I'm after.

Anyway, thanks again for your help and quick response. Again, I'll happily contribute a working sample.

--Jeff

EDIT: I added the annotation to org.xlsx4j.sml.CTTable and writing out the updated spreadsheet to a disk file works swimmingly. I don't have enough domain knowledge to know if that's a valid addition but if it is let me know and I'll do a pull request. --Jeff

Re: Editing Charts and Spreadsheets with pptx4j

PostPosted: Wed Oct 31, 2012 10:33 am
by jason
Hi Jeff

Adding the annotation is fine, since in this case there is a 1:1 correspondence between XMl element name and JAXB object.

Does this fix solve your broader problem, or are you still looking for help (in which case I will do more than skim read this topic)?

cheers .. Jason
ps contrib of a sample would be much appreciated :-)

Re: Editing Charts and Spreadsheets with pptx4j

PostPosted: Wed Oct 31, 2012 12:13 pm
by jeff
Hi Jason,

I would like to be able to convert the spreadsheet to a binary format in memory so if you have a solution for that, I'd appreciate your help with it.

Also, I created the pull request for that small change to org.xlsx4j.sml.CTTable.

Thanks again,

Jeff

EDIT: Jason, I figured it out. If we continue my previous example, adding something like this snippet will convert the SpreadsheetMLPackage with the updated worksheet to a byte[] which can be set on the EmbeddedPackagePart. I'll work up a sample ppt deck and associated sample program that does this. Look for the pull request in a bit. Thanks again for your responsiveness and help. Cheers!

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
/*
 *  Convert the Spreadsheet to a binary format, set it on the
 *   EmbeddedPackagePart, add it back onto the deck and save to a file.
 *  
 */

               
ByteArrayOutputStream baos = new ByteArrayOutputStream();
               
SaveToZipFile saver = new SaveToZipFile(spreadSheet);

saver.save(baos);
epp.setBinaryData(baos.toByteArray());

ppt.addTargetPart(epp);

// Write the new file to disk
ppt.save(new java.io.File(outputfilepath));

 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4

Re: Editing Charts and Spreadsheets with pptx4j

PostPosted: Mon Nov 05, 2012 3:31 am
by jeff
I wanted to follow up on this thread to include a snippet of code that demonstrates how to update string values in a spreadsheet since they are handled differently. I found this example in the forums so it's nothing new just wanted to have a more complete picture of the process in a single thread.

Note this example uses Apache commons StringUtils.

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
...
import org.apache.commons.lang.StringUtils;
...

if ( (cell.getR().equals(cId) && !StringUtils.isNumeric(crv.getValue())) ) {
    // If the value is not numeric it is stored in SharedStrings
        CTRst ctRst = new CTRst();
        ctRst.setT(crv.getValue());
        cell.setT(STCellType.INLINE_STR);
        cell.setIs(ctRst);
}
 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4



Cheers,

Jeff

Re: Editing Charts and Spreadsheets with pptx4j

PostPosted: Tue Nov 13, 2012 6:13 am
by jeff
Hello,

One more issue to resolve: it appears that when I update a spreadsheet that backs a chart, the chart itself is not updated without opening the spreadsheet for editing. Is this expected behavior? And if not, is there a trick to getting the chart to render the new data?


Thanks,

Jeff

Re: Editing Charts and Spreadsheets with pptx4j

PostPosted: Wed Nov 21, 2012 12:27 pm
by jason
Please see http://social.msdn.microsoft.com/Forums ... bf7395832/ entitled "Macro to update Object (excel chart) image cache"

Excerpted from that:

<<The solution -- manually -- is to double click on the chart...>>

Right. The equivalent in the object model is to get the InlineShape.OleFormat and use a method on it such as Activate, DoVerb, Edit, Open. I don't know of any way to perform the update without actually "opening" the chart and the problem with this is that it's very difficult to do this such that the user won't need to CLOSE it.

What you can try (but no guarantees):
When the chart in a Word 2010 document is activated, it opens an Excel workbook alongside. You cannot prevent this. (I screamed during beta and was basically told tough luck, this is how it's been designed.)

But you can get that workbook object: Excel.Workbook xlWB = objChart.ChartData.Workbook;

And once you're done, you can close the workbook, which should restore the windows: xlWB.Close();


So, I think what you need to do is experiment with a macro in Word, to do what you want. There is code at http://stackoverflow.com/questions/4838 ... nt-via-vba to get you started.

See also:
http://visualbasic.about.com/od/learnvb ... el01_2.htm
http://msdn.microsoft.com/en-us/library ... =office.12).aspx
http://www.vbaexpress.com/forum/showthread.php?t=43057

Once you have your macro working, you can make it an autoopen macro. You can use docx4j to inject a macro into the docx file (topic for a separate thread, but I do have code for this somewhere).

Re: Editing Charts and Spreadsheets with pptx4j

PostPosted: Thu Nov 22, 2012 1:46 am
by jeff
Hi Jason,

Hope you had a good holiday and thanks for your reply.

We opted to update both the chart and the spreadsheet at same time. I think in terms of code complexity both solutions are equivalent but all the code stays in Java in by just updating the chart too.

Anyway, the first iteration of the project is ready and I'm pretty happy with the outcome so thanks for your help and for docx4j in general.


Cheers,

Jeff

Re: Editing Charts and Spreadsheets with pptx4j

PostPosted: Thu Nov 22, 2012 7:15 am
by jason
jeff wrote:We opted to update both the chart and the spreadsheet at same time. I think in terms of code complexity both solutions are equivalent but all the code stays in Java in by just updating the chart too.


Hi Jeff, yes, that's a much nicer way to do it, if you can make it work, since it means the integrity of the document is not coupled to opening it in Word. I wonder whether what you have done can be abstracted, so docx4j could support doing this for certain chart types? Good to hear things are working out for you. cheers .. Jason