Well, XML is a markup language (and is really good at being that) while JSON is not. Sure, XML can be used as a poor man's data storage, as a base for a DSL, etc., but almost always there are better choices.
What are the better choices? And what were the better choices on the major platforms 10 years ago, the choice of which would have not seen every app use xml config files/dls/storage now?
I use csv when applicable. I use protobufs when applicable. But for the typical use case I choose xml for it's some config/dsl/dataset that needs to be human-editable (support comments, for example), more complex structure than csv supports, and preferably not need an external library or a custom parser. Json, Csv, Toml, S-Expressions, protobufs all fail one of more of these requirements. I'm sure there are others but none that don't have at least one drawback I don't want.
A poor man's data storage is exactly what I want!.
> preferably not need an external library or a custom parser. Json, Csv, Toml, S-Expressions, protobufs all fail one of more of these requirements.
And XML doesn't? Quite a few (not all, but quite a few nonetheless) programming languages include zero support for reading or writing XML-formatted data without using an external library or custom parser. This includes nearly all languages that predate XML, and quite a few languages that postdate it. Even when a language does have built-in (or at least in the standard library) support for XML, it's almost always a royal pain to use, especially once namespaces and schemas are involved.
Once upon a time, though, the answer was (and in a lot of places still is) INI:
- It's human-editable and supports comments
- It supports more complex structure than CSV
- Some languages have built-in support for it, and the Windows and GLib APIs support it, too (well, something similar enough to be compatible, in the latter case)
INI falls flat when you need to express deeper levels of nesting than keys and sections, though.
There's also YAML, which meets all your criteria about as well as XML does (at least on average; your specific language/platform might favor one or the other).
Right. Xml is to .NET what ini was for MFC. It’s what the platform “makes” you use. The same is true for json on js of course.
On a platform that has almost no support out of the box (e.g python) the choice is open. But on a platform that has a couple of formats built in, picking a format outside that platform is a pretty big step. The return needs to be substantial for a .net developer to use yaml via an external library over xml.
My reasoning in this thread has always started from the perspective that xml comes built in and almost no other format does. This is the case for e.g java and .net but not for python or C for example. But the prevalence of xml comes from java/.net so if we are to ask why, then we should consider that.
It seems like the "type: External" should line up with "metric" and "target" but no, it needs to line up with the word "external" - not the dash, but the word after a space after the dash. Using YAML frequently reminds me of the quote "Be open minded, but not so open minded that your brains fall out".
Schemas are awesome for so many cases. Even though JSON Schema is a giant evolving clusterfuck, I still use it to be able to enforce some consistency.
And, being honest, JSON schema is better than, say, GPB or Avro schema at enforcing field relationships, e.g., "if typeId is 7, then partnerId cannot be null"
You aren't responding to the comment here, you are just reasserting the article's position. I'd argue there is not another format that is obviously better for every data storage or exchange use case, or that surpasses all of XML's benefits while minimizing all of its downsides. I don't want to look at XML, but I do understand why it is being used.
Abuse of XML killed it a format. JSON is absolutely shit for semantic markup, and yet developers today routinely use it for documents because "XML is bad". They contrive ridiculous schemes for adding metadata and type information. They use it to generate HTML even when HTML takes less space. Finally, we regressed from XTML to HTML5. Buy-buy namespaces and parsing consistency.
> They use it to generate HTML even when HTML takes less space.
The fact that MobileDoc exists makes me physically ill. Something that can be expressed with one line containing a paragraph element and an italic tag is over a dozen lines of JSON spam.
Right, but, if given a choice of what to use, between XML and JSON, I'll pick JSON every time.
XML is a complete mess. Have you SEEN it's spec?
You can put JSONs spec in a single page. XMLs spec, not so much. Hell, most of the XML parsers don't support the spec, and the ones which do, historically have been riddled with security holes.
JSON over XML was simplicity over a crazy spec built by a bunch of companies all wanting to shove their own crazyness into it.
XML spec is longer than one page, granted, but it's about three times shorter than YAML spec. And XML spec describe not only the XML syntax, but also a basic form of validation (DTD), which include references. Basic XML has only five special symbols (<, >, ', ", and &) and can be parsed in linear time. (Namespaces complicate things somewhat.)
That's because json's spec isn't complete. It is predicated on the language interpreting it to be able to just eval the structure and work with the data [1]
For a non human-edited data storage or exchange that’s fine. Json is worse for human editable data though. Xml might not be the best alternative there but it beats json for things like small configs.
It’s not as simple as saying “everywhere xml is used, json would be a better choice”.
Your data storage format shouldn't be human readable. The data transferred over the wire shouldn't be human readable. It should be a binary serialized data format, probably encrypted, definitely compressed.
Yes, storage is cheap. Bandwidth is not. Also, you really don't want a human that intercepts your data to be able to read it. Additionally, your data structue only makes sense in the context of your domain, which usually has been modeled in your program(s) that work in that domain, and thus it will be better if you deserialize it within tools that understand that domain.
If you feel the need for a general purpose deserialization protocol, there are several available - Avro/Protobuf, etc.
Binary encoded data can often be decoded without consuming the entire document. Sax-paparser-like reading requires at least reading an open and a close before the data is useful.
String serialization is a wasteful endeavor. It makes life easy for devs because it takes one less step to read the data in a text editor or log message, but quite often requires hacks to model things like recursive or self-referential data structures, and wastes space by repeating property names constantly for every item within the serialized structure. It's predicated on four of the fallacies of distributed computing, namely - bandwidth is infinite, the network is secure, transport cost is zero, and latency is zero. It is a solution looking for a problem, and because we are lazy, we don't build tools that would make binary serialized formats just as easy to use as json/yaml/xml.
Markup is a mix of scalar and structured data (it's a structure discovered or associated with a scalar) and thus it contains everything it needs to express structured data alone: just remove the scalar. E.g.
Is this really a poor man's choice? And compared to JSON?! I can see at least the following advantages here:
1. Each element has explicit type name (invoice, item). JSON is "typeless", which simply means the type information travels out of band. And with XML namespaces these type names can be made globally unique, but still stay human readable.
2. Each element is self-contained, the code that produces the <item> doesn't need to know if there was an item before or after it so that it should add a separator. (The dangling comma problem in JSON.)
3. The attribute names are not just arbitrary strings as in JSON, there are strict rules of what can be in the name. They're much more suited for structured data than JSON, where you can name an attribute "foo.bar" and some JSON readers that accept a JSON "path" won't be able to find it.
4. It has less visual noise than JSON because the attributes don't have quotes around them and you don't need to separate elements with a special symbol. Despite the common belief well-written XML is more readable than JSON.
And we haven't event touched things like validation + extended types, references, and transformation of data.
Yet every time I've stumbled upon XML in the past decade or so It's been used as a data format because it's easy to manage and supported by every platform/tool out there. But sure, let's switch over to JSON or use a SQL database because we can't deal with the fact that XML might be better suited for something that it wasn't originally designed for.
It doesn’t answer the question, but I do wonder if XML would be an improvement in devops, compared to the current obsession with YAML. For everything except the part where you write it.
Make an xml stylesheet and your kubernetes cluster is instantly documented.
Maybe if you are using Notepad. Any decent text editor will provide things like auto indentation, completion, auto end-tags, structured editing, and schema validation. For example, Emacs comes with nXML mode:
That assumes that you edit XML all day long. This is not always the case.
I am writing non-XML code most of the day, and I do not have structured editing / auto braces enabled. So when I need to edit that one XML config, I'll open it in my regular editor, which will provide at most syntax highlight, and edit it as needed with a bit of swearing. And next time, I would promise myself I'd choose a different config format which does not need special editors.
Which "same thing"? Not setting up editor and complex environment for the things I am only going to edit once or twice? Yes.
In general, when you see something inefficient, you can either fix it to make it better, or ignore and come up with random workarounds.
In my opinion, a config file which cannot be edited by hand, and which needs a special editor with non-trivial learning curve, is a inefficiency. I can either ignore it, and set up the specialized tools; or I can fix it, by ripping out XML and replacing it with something more human-editable, like TOML or YAML. In large teams, it is almost always better to fix it -- sure, I will spend a few hours getting rid of XML, but this will pay itself off in the long term, as no one else will have to bother with special setup anymore.
(This obvious only applies to the systems where XML is a minor part, like a single configuration file. If your system has huge amount of XML, you better learn the right tools)
I don't understand, what is there to set up? With the Emacs mode I gave as an example, you just open an XML file and everything is there. Any decent text editor will have XML support.
xmllint --noout <file> will check the file and report any issues with XML syntax in a very detailed way with line numbers for you to see.
I myself don't even use syntax highlighting and normally work in vim and although I do make errors in XML sometimes, I find that I make at least as many syntactic errors in Python or C code that I have to weed out before I can proceed. But I never heard anyone complaining about Python or C being too strict :)
They all are, but xml isn’t the worst. (Json and S-expressions are worse, for example).
Not even formats designed for human consumption such as yaml are very good. The good ones for editing (toml, csv, ini) fall short when it comes to complex structure instead. There is no silver bullet.