I don't know if you've actually examined the OOXML format, but I spent about 6 m...

lars · on Feb 23, 2011

I've also done some work dealing with the Excel file format for a personal project (.xls not .xlsx). I think it needs to be clarified (not that you're implying this), that at least a lot of this mess isn't deliberate obfuscation on Microsoft's part.

For instance the SharedStringTable is something that made a lot of sense when documents had to fit on floppy disks. Excel is 26 years old, a when you evolve the file format for as long, while trying to maintain backwards compatibility, you'll inevitably be stuck with a messy format.

VMG · on Feb 23, 2011

I doubt that backwards compatibility is a priority at MS Office development

brudgers · on Feb 23, 2011

I've looked at OOXML on the Word side and I would agree that although it is human readable, the degree of cross-referencing makes understanding a typical document non-trivial. On the other hand, the technical threshold for processing the sort of issues you mention in XML is lower than doing so with binary data. Given the scale at which Microsoft operates that probably translates into higher productivity for their customers regarding programmatically manipulating or creating Office documents.

Although - all things being equal - unused entries and the potential for bloat are undesirable, they are rarely the primary goal for a project. Considering that compression is built into the file format, outside of outlying projects it probably does not count among the chief considerations of a typical project (even if it is aesthetically unappealing). And again, garbage collection within a binary format is an even more challenging task.