By accident, I saw firsthand how a simple layout, such as a page and a few paragraphs, can make you question why formats like Markdown are even possible, because the number one text processing tool would throw such a gigantuan load of crude syntax at you for a few paragraphs.
Respect to MS for keeping the lights on.
People need to understand that there is no MS format per se, but different standards from which you can choose. Years ago, when OpenDocument was fairly popular, MS was kind of hesitant to use an XML format. XML is a strict format, no matter the syntax.
And I bet that MS intended such a complicated format to prevent Open Source Projects from developing parsers and MS from losing market share this way. I bet there are considerations about such a strategy discussed at the time, buried in Archive.org.
On the other hand, MS didn't want nor see the XML chaos, which would follow later on. XML is a format, and all it demands is being formally correct. It is like Assembler, fixed instruction sets with lots of freedom, and only the computer needs to "understand" the code - if it runs, ship it.
ZEN of whatever cannot be enforced. JavaScript was once the Web's assembly language. Everything was possible, but you had to do the gruntwork and encapsulate every higher-level function in a module that consisted of hundreds of LoCs. Do in hundreds of LoCs, what a simple instruction in Python could achieve with one.
Babel came, TypeScript, and today I lost track of all the changes and features of the language and its dialects. The same goes for PHP, Java, C++, and even Python. So many features that were hyped, and you must learn this crap nevertheless, because it is valid code.
Humans cannot stand a steady state. The more you add to something, the more active and valuable it seems. I hate feature creep — kudos to all the compiler devs, who deserve credit for keeping the lights on.
> And I bet that MS intended such a complicated format to prevent Open Source Projects from developing parsers and MS from losing market share this way.
It wouldn’t surprise me at all if it simply was “the XML schema mostly follows how our implementation represents this kind of stuff”.
The source code of MS Word almost certainly has lots of now weird-looking design choices based on having to run in constrained memory. It also has dark corners for “we released a version that did this slightly different, so we have to keep supporting it”
> It wouldn’t surprise me at all if it simply was “the XML schema mostly follows how our implementation represents this kind of stuff”
That’s exactly what it was. They originally had a binary representation (.doc) which was pretty much just a straight-up dump of their internal data structures to disk. When they felt forced to make an “open” “xml-based” format, they basically converted their binary serialization to XML without changing what it represented at all. It was basically malicious compliance.
as far as I understand just parsing OOXML is by far not enough to get anywhere close to having a reasonable correct understanding of the layout of the document due to how it's "supper flexible" in ways going "beyond the OOXML standard", i.e. you still have to reverse engineer tone of things.
(i.e. they worked around the "XML is a strict format" part ;) )
or at least it was that way way back then when OOXML was new and the whole scandal about MS "happening" to not correctly implement their own standard thing was still news (so like 10+ years ago)
Respect to MS for keeping the lights on.
People need to understand that there is no MS format per se, but different standards from which you can choose. Years ago, when OpenDocument was fairly popular, MS was kind of hesitant to use an XML format. XML is a strict format, no matter the syntax.
And I bet that MS intended such a complicated format to prevent Open Source Projects from developing parsers and MS from losing market share this way. I bet there are considerations about such a strategy discussed at the time, buried in Archive.org.
On the other hand, MS didn't want nor see the XML chaos, which would follow later on. XML is a format, and all it demands is being formally correct. It is like Assembler, fixed instruction sets with lots of freedom, and only the computer needs to "understand" the code - if it runs, ship it.
ZEN of whatever cannot be enforced. JavaScript was once the Web's assembly language. Everything was possible, but you had to do the gruntwork and encapsulate every higher-level function in a module that consisted of hundreds of LoCs. Do in hundreds of LoCs, what a simple instruction in Python could achieve with one.
Babel came, TypeScript, and today I lost track of all the changes and features of the language and its dialects. The same goes for PHP, Java, C++, and even Python. So many features that were hyped, and you must learn this crap nevertheless, because it is valid code.
Humans cannot stand a steady state. The more you add to something, the more active and valuable it seems. I hate feature creep — kudos to all the compiler devs, who deserve credit for keeping the lights on.