This is a comical perspective to me. I've been ass-deep in core banking APIs where we generate service references from WSDL/XSDs. Some of the resulting codegen measures in the tens of megabytes for some files. I wouldn't even attempt to quantify the number of pages of documentation. And this is just for mid size US banking domain. Microsoft Office has to work literally everywhere for everything. The fact that it's only 8000 pages of documentation is likely a miracle.
If you're working with an XML schema that is served up in XSD format, using code gen is the best (only) path. I understand it's old and confusing to the new generation, but if you just do it the boomer way you can have the whole job done in like 15 minutes. Hand-coding to an XML interface would be like cutting a board with an unplugged circular saw.
Yeah another b(w)anker dev here, complex xsds seem to be the baseline in industry as soon as the role of that spec escapes simple 1 server : 1 client use case.
One example I work with sometimes is almost 1MB of xsds and thats a rather small internal data tool. They even have restful json variant but its not that used, and complexity is roughly the same (you escape namescape hell, escaping xml chars etc but then tooling around json is a bit less evolved). Xml to object mapping tool is a must.
While I generally agree, I don't think the author is complaining about the XML spec's complexity per se but rather that rendering the underlying structures to a page is hard.
If you have this much complexity and there is nothing you can do to reduce it, then the next best thing is to have an incredibly convenient way to stand up a perfect client on the other side of the fence within a single business day.
It's not that I agree with the characterization in the OP, that these formats are deliberately obtuse. It's that I do agree about them being obtuse, and that being able to say that you can auto-generate bindings for them doesn't actually help to make them not obtuse.
I do also think that Office should have created separate formats for project files and export files; if an RTF can hold onto all the formatting details of a typical Word document sufficient for pixel-accurately rendering it for example, then they should have conveyed that better and promoted it as the default export format (along with the idea of an export format), rather than immediately hitting people with a popup that claims their data will be partially lost. If this does exist (just not as an RTF), this point still stands - I don't use it, nobody I know uses it, so it may as well not exist.
Current state of affairs is people passing around docx, xlsx, etc. files, which are project files, hence why they (have to) contain (fancifully) serialized application state. Imagine if people passed around PSDs rather than PNGs. Or if people passed around FLPs rather than WAVs, FLACs or MP3s. It's this separation between the features of a document / spreadsheet / presentation and the features of the authoring software that appears to be completely absent from Microsoft Office, and this is something that just based on the information I have available, MS can legitimately be faulted for. Transitioning from a bespoke binary format to an XML based format with schemas available did basically nothing to help this.
And while it might seem like that I'm suggesting that export formats are this cleanly definable, self-evident things, I don't actually mean to suggest so either. It'd have had to have been a business decision. To where to draw the line would have been a decision that apparently never came to be debated internally, from what anyone can currently tell in retrospect at least, from the outside.
If you're working with an XML schema that is served up in XSD format, using code gen is the best (only) path. I understand it's old and confusing to the new generation, but if you just do it the boomer way you can have the whole job done in like 15 minutes. Hand-coding to an XML interface would be like cutting a board with an unplugged circular saw.