I'd argue that for data storage purposes, you'd like to have a low metadata to information ratio. In the examples the author gives this seems to be the main problem, with way more characters being used for markup than for content.
Compare that to JSON or TOML, which are more human-friendly and waste fewer bytes on structure to convey the same information. When used for data storage, two XML files of the same schema describing two completely different objects are likely to share a large amount of content, which is wasteful and gets in the way.
For storage of structured data (and probably even for loosely coupled RPC) you want format that is efficient and schema-oblivious. The bad choice 15 years ago was XML, bad choice today is JSON (the parsing overhead is not negligible) or ProtoBufs (not schema-oblivious). Various binary formats with JSON-like object model seems like the way to go (my choice is CBOR).
And then there is the EU-wide absurdity of WhateverAdES, which invariably leads to onion-like layers of XML in ASN.1 encoded as base64 in XML wrapped in CMS DER encoded message...
I beg to differ. For a start, XML compresses well and besides, storage is monster cheap these days. XML is a better storage format because it documents what the data is (a title, a reference etc) as well as the data itself.
XML does compress well as text or over the wire but the parsing trees can be quite large in memory and processing consumption. At least in Perl I've had enough scripts crash out due to this overhead when implementing the common/naive solution using off the shelf modules. You can get around this by choosing between DOM or SAX but I consider that a symptom of the problem, you choose XML to solve a problem and now you have another problem to solve.
I had the same problem with npm, I think, and JSON, because npm could not simply load the huge JSON file into memory. A huge anything can crash a naively written tool used to handle smaller instances.
That's true but I think XML has the edge there. It has so many features like defining new types which you wouldn't normally see in JSON. One parser we used had a ten to one ratio - 50MB of XML meant 500MB of RAM usage when using a DOM parser. And that's taking into account the textual representation of XML is already >50% bloat with the closing tags etc.
Compare that to JSON or TOML, which are more human-friendly and waste fewer bytes on structure to convey the same information. When used for data storage, two XML files of the same schema describing two completely different objects are likely to share a large amount of content, which is wasteful and gets in the way.