DOCX files created by LibreOffice are not only smaller but simpler in term of XML structure, and easier to interoperate. A two-page file created from scratch is 1100 XML lines if written by LibreOffice and 11500 XML lines if written by Microsoft Office.
The 10400 redundant XML lines are there to make it difficult to properly read the file. Also, they may contain non-standard elements which have been deprecated before the approval of the standard itself but are still there after 12 years.
MS Office is very old, from their point of view compatibility will always have to mean it works as much as possible like the previous version.
So suppose you have a three-way condition clause in some code, each paragraph has either "Straight", "Fast" or "Thom Hopkins" formatting. Hmm. During XML standard writing you ask engineers to explain these options so you can write them up for the standard.
"Straight" and "Fast" turn out to each have six paragraph definitions. Great! Write those in the XML standard. The guy you asked to work out "Thom Hopkins" has gone on sick leave due to a mental breakdown, he has left a forty page document, which includes excerpts from several multi-page C++ classes, one of which seems to be a partial implementation of a bin packing solver and another involves regular expressions.
You find the supervisor of the guy who last worked on the original "Thom Hopkins" code. He explains it was developed over 15 years by a large team and was originally a core part of the document engine before the invention of the faster "Straight" paragraph mode sidelined it.
Now, you _could_ add all this crap to an appendix of the proposed XML document standard, and watch a committee vomit when they try to read it OR you could say "Thom Hopkins" is a special mode and shouldn't be used in standards compliant documents, even though it's actually used in millions of templates for your own popular office suite. And then people will say you did it just to spite them...
I mostly agree with you, I subscribe to "don't attribute malice where you could attribute incompetence or unforseen factors" but I keep wondering in the back of my head why they would keep all this cruft in the docx format. They hard forked their document format in 2007 and caused a LOT of headache back then, why not take the opportunity to streamline all this stuff, you know? Why not remove "Thom Hopkins" in the case of your story?