Page 1 of 1

Generated Spreadsheets are incomplete?

PostPosted: Wed Feb 13, 2019 1:57 am
by bjenkins
Jason,
First. Thank GOD for this project! This is an amazingly powerfull library. But I have some questions. It seems that initial creation of spreadsheets is quasi-complete. After creating a spreadsheet programmatically, I have to load it and then turn around and save it for it to be canonical. I discovered this when I tried to use Microsoft's Spreadsheet Compare tool to compare two different xls4j generated spreadsheets and the tool showed my workbooks as basically empty. Once I loaded and saved them, the tool recognized the file structures and that they had data in them.

I've noticed the following after loading and then immediately saving a newly generated spreadsheet:
1. The file size tends to shrink DRAMATICALLY (so Excel must be doing some serious re-org on the file structure)
2. Missing parts like app.xml and core.xml are added by Excel.

For example, after initial creation of a simple spreadsheet that has styles and shared strings, I have the following parts:

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml" PartName="/xl/sharedStrings.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml" PartName="/xl/styles.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" PartName="/xl/workbook.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" PartName="/xl/worksheets/topics.xml"/>
</Types>

After I do a load and save I have the following:

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml" PartName="/docProps/app.xml"/>
<Override ContentType="application/vnd.openxmlformats-package.core-properties+xml" PartName="/docProps/core.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml" PartName="/xl/sharedStrings.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml" PartName="/xl/styles.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.theme+xml" PartName="/xl/theme/theme1.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml" PartName="/xl/workbook.xml"/>
<Override ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml" PartName="/xl/worksheets/sheet1.xml"/>
</Types>

So my questions are:

1. Do I need to programmatically create /docProps/app.xml and /docProps/core.xml and maybe /xl/theme/theme1.xml to make a "complete", well-rounded spreadsheet that other tools should be happy with?
2. I named my one and only sheet in the workbook "topics.xml" by issuing the following code: new PartName("/xl/worksheets/topics.xml"), but see that it got renamed internally by Excel (see above in the two sets of <Types>). Is this allowed or should I stick to the naming pattern that excel is expecting, like: new PartName("/xl/worksheets/sheet1.xml")
3. Are there any general rules of thumb to follow to create spreadsheets that are more or less the same compact size that Excel ultimately makes them after doing a load / save?

Thanks again for this wonderful library!

Re: Generated Spreadsheets are incomplete?

PostPosted: Wed Feb 13, 2019 5:09 pm
by jason
Re q1, those parts are optional, but https://github.com/plutext/docx4j/commi ... bcfb74f659 now writes some of the metadata properties, and the theme part.

Q2: Microsoft Office chooses its own part names. You can use others, but it will ignore your names. (The worksheet name which the user sees is set separately)

Q3: generally, don't worry about the file sizes. These are affected by what namespaces are written, and zip compression variations.

cheers .. Jason (sending quickly since I've gotta go!)