Well, XML is a *markup* language (and is really good at being that) while JSON i...

alkonaut · on Oct 29, 2019

What are the better choices? And what were the better choices on the major platforms 10 years ago, the choice of which would have not seen every app use xml config files/dls/storage now?

I use csv when applicable. I use protobufs when applicable. But for the typical use case I choose xml for it's some config/dsl/dataset that needs to be human-editable (support comments, for example), more complex structure than csv supports, and preferably not need an external library or a custom parser. Json, Csv, Toml, S-Expressions, protobufs all fail one of more of these requirements. I'm sure there are others but none that don't have at least one drawback I don't want.

A poor man's data storage is exactly what I want!.

yellowapple · on Oct 30, 2019

> preferably not need an external library or a custom parser. Json, Csv, Toml, S-Expressions, protobufs all fail one of more of these requirements.

And XML doesn't? Quite a few (not all, but quite a few nonetheless) programming languages include zero support for reading or writing XML-formatted data without using an external library or custom parser. This includes nearly all languages that predate XML, and quite a few languages that postdate it. Even when a language does have built-in (or at least in the standard library) support for XML, it's almost always a royal pain to use, especially once namespaces and schemas are involved.

Once upon a time, though, the answer was (and in a lot of places still is) INI:

- It's human-editable and supports comments

- It supports more complex structure than CSV

- Some languages have built-in support for it, and the Windows and GLib APIs support it, too (well, something similar enough to be compatible, in the latter case)

INI falls flat when you need to express deeper levels of nesting than keys and sections, though.

There's also YAML, which meets all your criteria about as well as XML does (at least on average; your specific language/platform might favor one or the other).

alkonaut · on Oct 30, 2019

Right. Xml is to .NET what ini was for MFC. It’s what the platform “makes” you use. The same is true for json on js of course.

On a platform that has almost no support out of the box (e.g python) the choice is open. But on a platform that has a couple of formats built in, picking a format outside that platform is a pretty big step. The return needs to be substantial for a .net developer to use yaml via an external library over xml.

My reasoning in this thread has always started from the perspective that xml comes built in and almost no other format does. This is the case for e.g java and .net but not for python or C for example. But the prevalence of xml comes from java/.net so if we are to ask why, then we should consider that.

EdwardDiego · on Oct 30, 2019

> There's also YAML

Omg YAML. Here's an example for you.

      - external:
        metric:
          name: kafka_consumer_group_lag
          selector:
            matchLabels:
              topic: rtb_trx_records
              consumer_group: trx-record-validator
        target:
          type: Value
          value: 30000
      type: External

It seems like the "type: External" should line up with "metric" and "target" but no, it needs to line up with the word "external" - not the dash, but the word after a space after the dash. Using YAML frequently reminds me of the quote "Be open minded, but not so open minded that your brains fall out".

yellowapple · on Oct 31, 2019

I'm kinda surprised that's even valid YAML. I was under the impression that arrays and dicts can't mix like that.

alkonaut · on Oct 30, 2019

Things like these should just be code in my opinion.

democracy · on Oct 29, 2019

and having schemas is also great for some cases

EdwardDiego · on Oct 30, 2019

Schemas are awesome for so many cases. Even though JSON Schema is a giant evolving clusterfuck, I still use it to be able to enforce some consistency.

And, being honest, JSON schema is better than, say, GPB or Avro schema at enforcing field relationships, e.g., "if typeId is 7, then partnerId cannot be null"

jeremyjh · on Oct 29, 2019

You aren't responding to the comment here, you are just reasserting the article's position. I'd argue there is not another format that is obviously better for every data storage or exchange use case, or that surpasses all of XML's benefits while minimizing all of its downsides. I don't want to look at XML, but I do understand why it is being used.

gambler · on Oct 29, 2019

Abuse of XML killed it a format. JSON is absolutely shit for semantic markup, and yet developers today routinely use it for documents because "XML is bad". They contrive ridiculous schemes for adding metadata and type information. They use it to generate HTML even when HTML takes less space. Finally, we regressed from XTML to HTML5. Buy-buy namespaces and parsing consistency.

redwall_hp · on Oct 30, 2019

> They use it to generate HTML even when HTML takes less space.

The fact that MobileDoc exists makes me physically ill. Something that can be expressed with one line containing a paragraph element and an italic tag is over a dozen lines of JSON spam.

sedachv · on Oct 30, 2019

SGML being replaced by XML being replaced by JSON is great proof that the idea of progress in tech is at best a myth and at worst a lie.

barberousse · on Oct 31, 2019

I mean, the idea of progress in general is barely agreeable outside of recent strides in the physical sciences

myrryr · on Oct 30, 2019

Right, but, if given a choice of what to use, between XML and JSON, I'll pick JSON every time.

XML is a complete mess. Have you SEEN it's spec?

You can put JSONs spec in a single page. XMLs spec, not so much. Hell, most of the XML parsers don't support the spec, and the ones which do, historically have been riddled with security holes.

JSON over XML was simplicity over a crazy spec built by a bunch of companies all wanting to shove their own crazyness into it.

sedachv · on Oct 30, 2019

http://seriot.ch/parsing_json.php

https://en.wikipedia.org/wiki/JSON#Data_portability_issues

XML has XSD and RELAX NG for more than 15 years now. https://json-schema.org/ is still a draft.

Mikhail_Edoshin · on Oct 30, 2019

XML spec is longer than one page, granted, but it's about three times shorter than YAML spec. And XML spec describe not only the XML syntax, but also a basic form of validation (DTD), which include references. Basic XML has only five special symbols (<, >, ', ", and &) and can be parsed in linear time. (Namespaces complicate things somewhat.)

jackcviers3 · on Oct 30, 2019

That's because json's spec isn't complete. It is predicated on the language interpreting it to be able to just eval the structure and work with the data [1]

1. http://seriot.ch/parsing_json.php

alkonaut · on Oct 30, 2019

For a non human-edited data storage or exchange that’s fine. Json is worse for human editable data though. Xml might not be the best alternative there but it beats json for things like small configs.

It’s not as simple as saying “everywhere xml is used, json would be a better choice”.

jackcviers3 · on Oct 30, 2019

Your data storage format shouldn't be human readable. The data transferred over the wire shouldn't be human readable. It should be a binary serialized data format, probably encrypted, definitely compressed.

Yes, storage is cheap. Bandwidth is not. Also, you really don't want a human that intercepts your data to be able to read it. Additionally, your data structue only makes sense in the context of your domain, which usually has been modeled in your program(s) that work in that domain, and thus it will be better if you deserialize it within tools that understand that domain.

If you feel the need for a general purpose deserialization protocol, there are several available - Avro/Protobuf, etc.

Binary encoded data can often be decoded without consuming the entire document. Sax-paparser-like reading requires at least reading an open and a close before the data is useful.

String serialization is a wasteful endeavor. It makes life easy for devs because it takes one less step to read the data in a text editor or log message, but quite often requires hacks to model things like recursive or self-referential data structures, and wastes space by repeating property names constantly for every item within the serialized structure. It's predicated on four of the fallacies of distributed computing, namely - bandwidth is infinite, the network is secure, transport cost is zero, and latency is zero. It is a solution looking for a problem, and because we are lazy, we don't build tools that would make binary serialized formats just as easy to use as json/yaml/xml.

Mikhail_Edoshin · on Oct 30, 2019

Markup is a mix of scalar and structured data (it's a structure discovered or associated with a scalar) and thus it contains everything it needs to express structured data alone: just remove the scalar. E.g.

  <invoice id="123" customer-id="456" date="20199-10-30">
    <item no="1" product-id="789" qty="42" />
  </invoice>

Is this really a poor man's choice? And compared to JSON?! I can see at least the following advantages here:

1. Each element has explicit type name (invoice, item). JSON is "typeless", which simply means the type information travels out of band. And with XML namespaces these type names can be made globally unique, but still stay human readable.

2. Each element is self-contained, the code that produces the <item> doesn't need to know if there was an item before or after it so that it should add a separator. (The dangling comma problem in JSON.)

3. The attribute names are not just arbitrary strings as in JSON, there are strict rules of what can be in the name. They're much more suited for structured data than JSON, where you can name an attribute "foo.bar" and some JSON readers that accept a JSON "path" won't be able to find it.

4. It has less visual noise than JSON because the attributes don't have quotes around them and you don't need to separate elements with a special symbol. Despite the common belief well-written XML is more readable than JSON.

And we haven't event touched things like validation + extended types, references, and transformation of data.

arkitaip · on Oct 29, 2019

Yet every time I've stumbled upon XML in the past decade or so It's been used as a data format because it's easy to manage and supported by every platform/tool out there. But sure, let's switch over to JSON or use a SQL database because we can't deal with the fact that XML might be better suited for something that it wasn't originally designed for.

ljm · on Oct 30, 2019

It doesn’t answer the question, but I do wonder if XML would be an improvement in devops, compared to the current obsession with YAML. For everything except the part where you write it.

Make an xml stylesheet and your kubernetes cluster is instantly documented.

axilmar · on Oct 30, 2019

I recently entered DevOps (not my choice), and I would like to take your request further: replace everything with a regular language.

Having to use a gazillion declarative languages to achieve what a regular programming language does is simply crazy.

alkonaut · on Oct 30, 2019

Here the next thing I hope will be code in the format you use for the apps themselves.

It’s testable, discoverable etc.

https://www.pulumi.com/

(Sorry about shameless plug, I’m not affiliated)

RantyDave · on Oct 30, 2019

XML is pretty horrible to edit by hand.

sedachv · on Oct 30, 2019

Maybe if you are using Notepad. Any decent text editor will provide things like auto indentation, completion, auto end-tags, structured editing, and schema validation. For example, Emacs comes with nXML mode:

https://www.gnu.org/software/emacs/manual/html_mono/nxml-mod...

theamk · on Oct 30, 2019

That assumes that you edit XML all day long. This is not always the case.

I am writing non-XML code most of the day, and I do not have structured editing / auto braces enabled. So when I need to edit that one XML config, I'll open it in my regular editor, which will provide at most syntax highlight, and edit it as needed with a bit of swearing. And next time, I would promise myself I'd choose a different config format which does not need special editors.

sedachv · on Oct 30, 2019

> I'll open it in my regular editor, which will provide at most syntax highlight, and edit it as needed with a bit of swearing.

That sounds like a very passive-aggressive way to deal with a problem. Do you do the same thing when writing programs?

theamk · on Oct 30, 2019

Which "same thing"? Not setting up editor and complex environment for the things I am only going to edit once or twice? Yes.

In general, when you see something inefficient, you can either fix it to make it better, or ignore and come up with random workarounds.

In my opinion, a config file which cannot be edited by hand, and which needs a special editor with non-trivial learning curve, is a inefficiency. I can either ignore it, and set up the specialized tools; or I can fix it, by ripping out XML and replacing it with something more human-editable, like TOML or YAML. In large teams, it is almost always better to fix it -- sure, I will spend a few hours getting rid of XML, but this will pay itself off in the long term, as no one else will have to bother with special setup anymore.

(This obvious only applies to the systems where XML is a minor part, like a single configuration file. If your system has huge amount of XML, you better learn the right tools)

sedachv · on Oct 31, 2019

I don't understand, what is there to set up? With the Emacs mode I gave as an example, you just open an XML file and everything is there. Any decent text editor will have XML support.

Mikhail_Edoshin · on Oct 30, 2019

xmllint --noout <file> will check the file and report any issues with XML syntax in a very detailed way with line numbers for you to see.

I myself don't even use syntax highlighting and normally work in vim and although I do make errors in XML sometimes, I find that I make at least as many syntactic errors in Python or C code that I have to weed out before I can proceed. But I never heard anyone complaining about Python or C being too strict :)

alkonaut · on Oct 30, 2019

They all are, but xml isn’t the worst. (Json and S-expressions are worse, for example).

Not even formats designed for human consumption such as yaml are very good. The good ones for editing (toml, csv, ini) fall short when it comes to complex structure instead. There is no silver bullet.