I don't know if you've actually examined the OOXML format, but I spent about 6 months dealing with it in great detail at my old job, specifically Excel. For simple things, conventional XML manipulation tools could potentially be useful, but there are some serious issues.
For one thing, the files office generates and consumes are significantly different from the format as specified in the ECMA and ISO standards. The schemas for ECMA-376 will specify elements as occuring in any order, while Excel will require those elements to appear in a fixed order. Tons of tiny problems like this exist everywhere, and you only need one mistake to create a malformed document. The machine-readable XML schemas provided by Microsoft will not work without massaging.
The format also makes it very difficult to make simple in-place changes. A good example of this is what you might consider the most basic information in a spreadsheet- cells. Cell tags are contained in row tags which are contained in a sheetData tag. Rows contain their row number as an attribute, while cells contain their A1 "row/column" ID. This means if you remove or reposition a row in a spreadsheet you must update every cell in every subsequent row.
How about extracting a string value from a cell? The format makes it possible to store inline strings directly in a given cell, but Excel itself never uses this functionality. Instead, the cell contains an index into the SharedStringTable, which is stored as a separate XML document. If you delete or modify a cell, it might be referencing an SST entry that is no longer used. The only way to know is to search the document globally, remembering that dozens of different elements could potentially refer to a shared string. If your goal is to avoid bloating documents with junk, you have to solve this problem for a number of cases- style references, fonts and more.
If you want to modify OOXML documents in a robust, thorough manner, you'll deal with tons of issues like this.
I've also done some work dealing with the Excel file format for a personal project (.xls not .xlsx). I think it needs to be clarified (not that you're implying this), that at least a lot of this mess isn't deliberate obfuscation on Microsoft's part.
For instance the SharedStringTable is something that made a lot of sense when documents had to fit on floppy disks. Excel is 26 years old, a when you evolve the file format for as long, while trying to maintain backwards compatibility, you'll inevitably be stuck with a messy format.
I've looked at OOXML on the Word side and I would agree that although it is human readable, the degree of cross-referencing makes understanding a typical document non-trivial. On the other hand, the technical threshold for processing the sort of issues you mention in XML is lower than doing so with binary data. Given the scale at which Microsoft operates that probably translates into higher productivity for their customers regarding programmatically manipulating or creating Office documents.
Although - all things being equal - unused entries and the potential for bloat are undesirable, they are rarely the primary goal for a project. Considering that compression is built into the file format, outside of outlying projects it probably does not count among the chief considerations of a typical project (even if it is aesthetically unappealing). And again, garbage collection within a binary format is an even more challenging task.
For one thing, the files office generates and consumes are significantly different from the format as specified in the ECMA and ISO standards. The schemas for ECMA-376 will specify elements as occuring in any order, while Excel will require those elements to appear in a fixed order. Tons of tiny problems like this exist everywhere, and you only need one mistake to create a malformed document. The machine-readable XML schemas provided by Microsoft will not work without massaging.
The format also makes it very difficult to make simple in-place changes. A good example of this is what you might consider the most basic information in a spreadsheet- cells. Cell tags are contained in row tags which are contained in a sheetData tag. Rows contain their row number as an attribute, while cells contain their A1 "row/column" ID. This means if you remove or reposition a row in a spreadsheet you must update every cell in every subsequent row.
How about extracting a string value from a cell? The format makes it possible to store inline strings directly in a given cell, but Excel itself never uses this functionality. Instead, the cell contains an index into the SharedStringTable, which is stored as a separate XML document. If you delete or modify a cell, it might be referencing an SST entry that is no longer used. The only way to know is to search the document globally, remembering that dozens of different elements could potentially refer to a shared string. If your goal is to avoid bloating documents with junk, you have to solve this problem for a number of cases- style references, fonts and more.
If you want to modify OOXML documents in a robust, thorough manner, you'll deal with tons of issues like this.