XML: Contrary to popular belief, it doesn't always kill babies

Groxx · on Feb 19, 2011

>There just isn't a really nice way to represent a tree structure in something like JSON.

What? They're trivially identical! Here, an algorithm to convert from XML to JSON:

  < => {
  > => }
  attribute="value" => key:"value"
  contents =>  $:[{},"",{},etc,"look ma, multiple text nodes!"]
  node name => #:"name"

Actually, there's one that most JSON parsers can do that XML can't:

  key: function(){alert("Oh joy, someone used <em>eval!</em>");} => ???

edit: thought of another one:

  {"spaces in <key name{w00t}>":value} => ?

I have absolutely no idea how people come to the conclusion that XML somehow magically does X while nothing else does. XSLT can be modified to work on JSON, schema definitions too (how about: http://tools.ietf.org/html/draft-zyp-json-schema-03 , or a pre-defined set of rules like JSON-RPC?), and XPath == CSS-like selectors with a different set of characters.

The only difference between XML and JSON is the character set and the tool-chains available. JSON is rapidly gaining the same capabilities through tools, while XML has tons of non-compliant libraries (I've had to deal with other people's XML APIs in cases where attribute order mattered (likely hand-rolled, I know), or they couldn't parse a standard schema doc, or they couldn't handle namespaces properly).

About the only thing you can say is that JSON is XML "lite", currently, and is a new chance to do things differently / correctly. The problems many people have with XML relates to just that - non-compliant libraries that clog the XML tubes, making working with it a total crap-shoot often enough that it's worth avoiding.

jerf · on Feb 20, 2011

"key: function(){alert("Oh joy, someone used <em>eval!</em>");}"

That is not JSON. JSON is a rigidly specified serialization format as described at http://json.org , and exactly that. It is not merely "whatever some Javascript intepreter will accept", because that is a terrible data exchange format. Functions are not allowed in JSON.

Also you forgot to double-quote your key, which is required. In fact using single-quotes for keys in JSON is forbidden, even though Javascript permits it. But this is less important than claiming JSON can carry function definitions.

"XSLT can be modified to work on JSON, ..."

Anything can be made to work on anything. XML does have a large stack of standards that contains stuff that already is defined and exists, so you don't have to do the job of making the things that work; you just go grab a parser and an XSLT transformer, rather than writing it yourself. My personal favorite is namespaces, though nobody ever uses them correctly and consequently they're much less useful than I'd like. That's what it means to say XML does something when JSON does not; Turing complete languages can accomplish any task on either format, but you can't just go grab things for JSON as easily. You mention non-compliant libraries which absolutely is a big problem, but compliant libraries can be found.

This is correctable over time, certainly, but bear in mind that the very act of correcting JSON will turn it into the same monster that XML is, albeit perhaps a bit simpler. A JSON schema language will be the same pile of complicated stuff that the XML schema languages are. The problem is the complicated thing there, not the solution, and your solution may not be simpler than the problem. Some of the other "XML problems" are of the same nature; the problems with SOAP, for instance, have almost nothing to do with XML and everything to do with trying to be The Ultimate Solution to Everything Everywhere and consequently being a complicated pile of useless garbage.

Don't mistake me for an XML partisan. I say use JSON now unless you are actually dealing with marked-up text, in which case XML is superior. But if you're going to make good decisions about what to use, you need to understand both very well and the reasons behind that rule-of-thumb I give, not "XML bad, JSON good."

Groxx · on Feb 20, 2011

Agreed. But isn't a language defined largely by what people are willing to do with it? XML has some wonky specifications that ultimately mean parsing XML is entirely indeterminate - external, over-http schemas, for instance. If every JSON parser in the world accepted single-quotes for keys, wouldn't that imply JSON supports single-quotes for keys? Would every subsequent parser that supported single-quoted keys be wrong for doing so? Same thing for namespaces: because people have abused them so horribly, they're almost useless.

XML libraries have been around for a long time. As have problems with XML libraries. Apparently it's not correctable over time, there's too much legacy to fight against.

---

>I say use JSON now unless you are actually dealing with marked-up text, in which case XML is superior.

I've heard that argument before. How, precisely, is it superior? It's denser than would be doable in my example, but is denser better? A lot of the arguments around both languages revolve around human vs computer readability. Is <em>text</em> more computer-readable than {#:"em", $:"text"} ? A bit more human-readable, yes, but I'll take "text" over either as often as I can.

true_religion · on Feb 20, 2011

> But isn't a language defined largely by what people are willing to do with it?

This isn't true for a formalized language. Heck, it isn't even true for FRENCH where there is an official bureau that determines what is "proper" french, and any deviations from it are not part of the standard language.

> If every JSON parser in the world accepted single-quotes for keys, wouldn't that imply JSON supports single-quotes for keys?

It might in the absence of a formal spec, but not all JSON parsers accept single quotes.

For example, the official Python JSON parser does not.

>>> import json >>> json.loads("{'foo':'bar'}") ValueError: Expecting property name: line 1 column 1 (char 1)

I question how many other highly popular languages have JSON parsers built to the spec.

Groxx · on Feb 20, 2011

Tons of highly popular languages don't have XML parsers built to the spec. Very few of them can handle schemas at all, or recursive definitions in schemas, or external document types, or external entities or schemas (and the few that can are usually disabled intentionally, because they can make parsing a tiny XML file take minutes if the server of the schema doesn't serve it up fast enough). That the XML spec defines such things is irrelevant if they're wholly unreliable because enough parsers refuse to handle them. And there are more than enough horror stories out there to argue that a significant amount of systems don't handle to-spec XML.

There are plenty of RFCs out there that define things that no longer practically exist - should they be used, because the spec defines something that you need? Especially if it's meant to be used as an interchange format?

saurik · on Feb 20, 2011

And if JSON did all of those things the libraries that implemented it would suck as well: the complexity comes from the semantics of what all of those things are and do (schema validation is a complex problem, and that has nothing to do with whether you are validating XML or JSON), not how to parse a silly file format.

Hell: a lot of highly popular languages, to be quite honest for a minute, probably don't even have compliant JSON parsers. The real problem here is cowboy developers who think "oh, JSON/XML/whatever is easy, I can throw one of those together in a few minutes" and then /don't even read the spec/ before committing their project into some poor language's standard library.

Homunculiheaded · on Feb 19, 2011

I once explored the topic of "xml is just verbose s-exps" with someone who had no lisp experience. His first response was that it would be impossible to just use lisp in place of xml because you would then have to implement a minimal lisp parser into any language that wanted to make use of data stored in s-exps. I then pointed out that you have to do the same thing for xml or json, it's just that we've become used to standard tools existing to do this.

That's actually where I find that XML is the most dangerous. People just take it as a fact of the universe that XML exists and make so many assumptions about how it solves a particular problem that they don't even stop to think "could there be a better way?"

GFischer · on Feb 20, 2011

I'll admit my ignorance about the existence of S-expressions (had to look them up here: http://en.wikipedia.org/wiki/S-expression )

I've used XML because .NET has a parser, and I'll probably use JSON because parsers are widespread... are there good standard parsers for S-expressions for .NET or other popular platforms?

A quick googling sends me here

http://stackoverflow.com/questions/3051254/parsing-lisp-s-ex...

which makes me believe.NET can't natively process s-expressions with ease, and that's a HUGE minus for me.

dfox · on Feb 20, 2011

Few lines of code in the last answer is actually everything you need to parse (reasonable subset of) S-expressions. So there probably isn't any general purpose programming language that cannot handle S-expressions with ease.

That holds if you want to use S-expressions as data exchange format (or like network protocol as is done by Subversion). Parsing Common Lisp source is entirely different matter.

GFischer · on Feb 20, 2011

It does seem you can handle them, but we .NET people are lazy (as in, Larry Wall lazy http://en.wikipedia.org/wiki/Larry_Wall ).

I agree with the poster's reply to that: "Yes I agree that this is simple and fast, but.. What I really want is a 1-D list of my own object types." I get that when using XML web services with .NET (though mostly when consumed with other .NET or similar languages !!! which goes against it being an universal data transfer language )

Groxx · on Feb 19, 2011

Exactly. XML is a format, nothing more; it's the culture and everything around it that makes it a problem.

saurik · on Feb 20, 2011

> That's actually where I find that XML is the most dangerous. People just take it as a fact of the universe that XML exists and make so many assumptions about how it solves a particular problem that they don't even stop to think "could there be a better way?"

Unfortunately, even when presented with numerous implementations of interesting algorithms like schema validators, tree transformation tools, etc. that /happen/ to use XML (for our purposes, a specific verbose dialect of s-expressions) as an interchange format, the entirely opposite problem occurs: people ask the question "could there be a better way?" needlessly, and then reinvent some subset of all of that tested and working code based on around something they made up, or even something people generally like (JSON), and throw away all of that existing ecosystem; and, in the end, the only way in which it is better is that it isn't XML.

6ren · on Feb 20, 2011

    {
      "name":"Product",
      "properties":{
        "id":{
          "type":"number",
          "description":"Product identifier",
          "required":true
        },
      ...

Though the json representation of json-schema is nicer than XML, I think this approach to a schema language is just as bad as XML Schema. It even looks like they're trying to replicate the features of XS. Which might make sense from an adoption point of view, but ...

There isn't that much difference between hierarchical data formats. Even here, some people favour s-exps or YAML over JSON. So once one gets established it's difficult to dislodge due to standardization, network effects, tools, user familiarity etc. However, it can change when there is a new platform, which starts fresh with a different format.

JSON already owns browser communication (though note the pesky X in AJAX...), since browsers already use JavaScript, but it seems unlikely to usurp XML much further (exception: JS becomes prevalent on the server-side).

nivertech · on Feb 20, 2011

The better schema for JS - should be staticly typed classes/objects.

> though note the pesky X in AJAX... it also sometimes called AJAJ

fizx · on Feb 20, 2011

Two comments:

1. You forgot namespaces. Gotta put SVG in your XHTML.

2. By the time you've done your translation, you've created json that is uglier and harder for a human to visually parse than the equivalent xml.

It turns out that json is useful when you don't want xml's features. Trying to shoehorn every one of xml's features into json creates another brand of hell.

Groxx · on Feb 20, 2011

1) Namespace complaints pop up every time JSON debates pop up. XML handles them two ways, and their equivalents, fully within what I already defined:

  ns:node => {#:"ns:node"}
  xmlns="http://www.w3.org/1999/xhtml" =>
    {"xmlns":"http://www.w3.org/1999/xhtml"}

ie, it's already handled. Namespaces are a parser's job for interpretation, the notation is simple.

2) The point wasn't to say JSON is better than XML - apparently you didn't read my second sentence. I said they're trivially identical - they're the same thing, with the same capabilities, but with minimally different ways of representing the same data. The point is that claiming XML can do X because it's XML and JSON can't because it's not XML is complete nonsense. All the schema / XSLT / etc tools are just that - external tools, algorithms to run against data, which have nothing to do with how the data is transmitted.

saurik · on Feb 20, 2011

The key problem I have with your #2 is that it ignores what people actually mean when they say "can't". When I say I can't lift that rock over there, I don't mean that if I went and trained for a few years I wouldn't be able to do it: I mean that if I went over there right now and tried to lift it I would fail. When you realize that this is what normal people mean by "can't", suddenly concepts like "XML can do something JSON can't" make perfect sense: there exists a tool/library that makes that happen right now, and for JSON it is some theory that you have that would require sitting down and spending a bunch of time coding. (And no: those people don't "actually believe" that JSON could never be made to do those things ever: the more you interrogate them on this matter the more it will seem that way as they are going to keep saying "can't" and it is going to keep meaning something different from the way you are using it.)

thurn · on Feb 20, 2011

I was more trying to say that XML is better for heterogenous trees because XML's default is a tree structure. XML parsers default to a tree representation of your data (we use python's ElementTree, for example), JSON parsers don't. XPath is designed to be a tree navigation language. CSS selectors are... also designed for navigating XML. No equivalent exists for JSON that I know of.

Obviously, we could use JSON or YAML or S-Expressions for representing something like a parse tree, but none of them have the same robust tool chain for tree manipulation that XML has.

Groxx · on Feb 20, 2011

>JSON parsers don't

How could they not? It's JavaScript Object Notation. Objects which refer to sub-objects form a tree.

As to CSS selectors for JSON: the same functionality can exist, because as per above they have literally identical capabilities for representing data. I'll readily admit though, I've not seen one (though one of my side-projects could probably be tweaked to do this; maybe I will).

I have nothing against XML. It's useful, widely supported, and has quite possibly more tools than any other interchange format. The problem isn't with XML, it's with the inconsistencies of (and between) those tools that make the whole interchange part of XML with specs (the only really useful part - it's a horribly wasteful storage format) almost a moot point. And due to legacy systems, it's unlikely to change any time soon, or perhaps ever.

So a clean break is likely necessary to really improve things.

nailer · on Feb 20, 2011

FYI if you use elementree, try lxml. It hasx path support, which you'll wonder how you ever did without.

tsaixingwei · on Feb 20, 2011

I find that one thing you can do in XML but not in JSON is comments. There isn't any equivalent of XML Comments in JSON. Frequently, my configuration files require comments to show what options are available, or to easily turn off an option temporarily by commenting it out. I don't see how to do that if my configuration files are in JSON.

Groxx · on Feb 20, 2011

I'd imagine it depends on your parser, and if non-browser JSON gets more prevalent it'll probably be addressed. JSON has no format specified that I'm aware of, but some browser-like parsers will handle things like this:

  {"key":"value" // comment
  or 
  {
    // comment
    "key":"value"

Which I find wholly acceptable (though poorly supported).

tsaixingwei · on Feb 20, 2011

Wouldn't that only work if the json/javascript is pretty-printed and not minified?

I'd think this problem of comments in JSON requires a solution that is part of the JSON specifications rather than dependent on individual parsers implementations.

Or perhaps JSON only really shines as a data interchange format and not much else.

nailer · on Feb 20, 2011

Sorry, on iPad and accidentally downmodded your useful comment.

nivertech · on Feb 20, 2011

>and XPath == CSS-like selectors with a different set of characters.

I also have a feeling, that jQuery reinvents XPath.

Groxx · on Feb 20, 2011

Not if you take jQuery and compare it to XML library X, because that's the actual comparison. XPath == Sizzle, not jQuery - jQuery adds functions to manipulate what the selector returned, which the selector really has no business doing.

nivertech · on Feb 20, 2011

What in jQuery is a chain of multiple selectors and filters, can be done using single query in XPath, i.e.

  "document()/html//div[class='title']//img[class='icon']/@src"

Groxx · on Feb 20, 2011

I gather you haven't done much jQuery coding. That's equivalent to:

  $("html div.title img.icon").attr("src")

The last .attr("src") is purely because jQuery always deals with a node / a collection of nodes.

You can also do arbitrary attribute searches:

  $("html div[class~='title'] img[src*='google']")
  // "~=" == "contains word" - space-delimited to match html classes
  // "*=" == "contains substring"

http://api.jquery.com/category/selectors/

nailer · on Feb 20, 2011

You're right, but jquery doesn't handle XML namespaces. Alas I wish it did.

snprbob86 · on Feb 19, 2011

I have a strict NO XML policy in any codebase that I control.

It's not because I think that XML necessarily always kills babies, it's because I think that XML has never once been the best choice. "Does everything well enough, but I don't need to learn or develop a new technology" is not a valid reason to use it and there are plenty of valid reasons not to use it.

That said, XML has killed every baby I've ever exposed to it.

zdw · on Feb 19, 2011

Does this go along with your strict "NO WINDOWS" and "NO FUN" policies as well?

Seriously, there are places where XML is the right tool, which the poster does a fine job of bringing up.

There are a lot of places where it is not, but was used because something better for those specific use cases hadn't been invented yet - JSON is barely 10 years old, whereas XML's SGML roots go back over 30.

Whatever the case, we're better off with ANY structured data format than old delimited and fixed column width formats from the dinosaur days.

beej71 · on Feb 19, 2011

I've exposed it to the "structured document markup" baby, and it has cared for it quite nicely.

bpodgursky · on Feb 19, 2011

Do you count even things like ant buildfiles? I won't say that editing them is fun, but it's no worse than configuring any other build system.

xiongchiamiov · on Feb 20, 2011

Really? In my (admittedly limited) experience, ant was much more of a PITA than things like Rake and Scons.

beagle3 · on Feb 20, 2011

Well, Java is worse than most things, but even for Java - djb redo, recently implemented by Avery Pennarun (apenwarr) is better than everything else.

endtime · on Feb 20, 2011

If you're using Java, you've already given up on not killing babies.

DjDarkman · on Feb 19, 2011

> Computers adore XML. It's nice and easy to parse, so structured and precise. Computers love that shit.

It's not the easy to parse, without interpreting complex schemas that are sometimes missing. It's not even easy to map, you have the attributes, the children and inner text, this makes it a pain to map to native objects.

There may be good tools for this, but I wouldn't call parsing it easy especially compared to other formats out there.

_delirium · on Feb 20, 2011

The fact that you can't parse it with standard parsing tools doesn't seem like a win on the parseability front either. Not its biggest problem, but nothing in this use-case made it necessary to define a syntax that tools like lex/yacc can't parse. The core problem is that matching up start/end tags can't be done in a context-free language unless there are a finite number of tags (it's isomorphic to the problem of checking for palindromes). You'd need a parser-generator that lets you do backreferences (which makes it not context-free), so you could write a definition along the lines of:

  element = '<' + element_name + '>' + element + '</' + $1 + '>'
          | nil

Either that, or a hand-rolled parser, which is in practice what XML parsers are.

Homunculiheaded · on Feb 19, 2011

The more I work with XML, the more I'm convinced that Naggum was dead on: http://www.schnada.de/grapt/eriknaggum-xmlrant.html

XML is, for me, the classic example of the problem of reinventing things badly. The sad thing is that we've barely been programming for 60 years and we've already reached the point where much of our work is wasted in this process.

andolanra · on Feb 20, 2011

"The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well." —Philip Wadler, "The Essence of XML," presented at POPL 2003

And he knows what he was talking about—he was one of the people who worked on XQuery, after all.

billsix · on Feb 20, 2011

A quote of his: "XML is a giant step in no direction at all"

krig · on Feb 19, 2011

> Microsoft might not be totally crazy for basing their docx file format on XML. Honestly, I'd expect them to be uniquely qualified by years of hindsight, shame, and regret to design a document format.

I don't know if unparallelled failure is a good indicator of design ability...

It's amusing though, pretty much the only area where the article actually claims XML doesn't kill babies is in representing trees. Ehm, how about sexprs for hierarchial data representation?

My personal opinion is that XML always kills the baby, and nothing I've experienced has made me reconsider that assessment.

nivertech · on Feb 19, 2011

For somebody who want to make XML shorter and more human-readable:

1. Use more attributes, i.e.:

  <employee first="Steve" last="Jobs/>

instead of

  <employee>
     <first>Steve</first>
     <last>Jobs</last>
  </employee>

2. Use Camel case instead of '_', i.e.:

  <MyElementName>XXX</MyElementName>

instead of:

  <my_element_name>XXX</my_element_name>

3. Use sane namespace prefixes.

4. Write XSD schema (or generate it from example XML document) - Tools like XMLSpy really helpfull in filling complex XMLs.

cperciva · on Feb 20, 2011

Some of us find unix_style names far more readable than CamelCase. Underscores are almost spaces and make it much easier to recognize where each word starts.

nailer · on Feb 20, 2011

Wouldn't unix style be uxixstyle?

Groxx · on Feb 19, 2011

1 is a good example of something I see people doing wrong with XML on a regular basis. If there can't be a list of items, it should be an attribute, not a node.

nivertech · on Feb 19, 2011

Actually it's abuse of attributes, since attributes must be metadata only, while data itself should be in element content. But it makes large XML files much more readable ...

Groxx · on Feb 19, 2011

Only because you don't tend to find XML tidiers that display like this:

  <node_name
    attr="value"
    attr2="value2">

always this:

  <node_name attr="value" attr2="value2" etc="etc" ad="infinitum" until="you have to scroll">

edit:

On a <person>, how do you define what's data and what's metadata? Is a name data or metadata? What's the difference between data and metadata in the first place, in a concrete, testable definition? Google define:metadata says:

  data about data; "a library catalog is metadata because it describes publications"

So a library catalog should be like this?

  <library name="of congress"
    978_3_16_148410_0="<data title=\"\">contents</data>" 
    978_3_16_148410_1="<data title=\"\">contents</data>"
    etc >

(intentionally reducto-ad-absurdum)

udp · on Feb 19, 2011

Nicely unbiased. Not sure about the title, though, since the article is (quite rightly) more about why not to use XML.

fedd · on Feb 19, 2011

will someone write that java or at least jvm doesn't kill babies?

i remember how someone saw an xml config in my java-based software demo and judged it's innovatiiveness based on it. and it was not even mine, it was standard web.xml!..

i think the main thing is not be religious about formats.

bpodgursky · on Feb 19, 2011

If they're dumb enough to judge the innovativeness of software based on something like that, their opinion is probably irrelevant.

nailer · on Feb 20, 2011

If the software doesn't care about ease of configuration, what else doesn't it care about?

fedd · on Feb 19, 2011

thanks, just hope they're not consulting investors that seem to tend to love simplified statements

wallflower · on Feb 20, 2011

Java has won. It is the COBOL of our generation. The reason why people hate Java is because they don't want to be programming Java - they want to be coding in something cooler. But the thing is, trendy doesn't necessarily equate to stability.

orls · on Feb 20, 2011

"When your only tool is XML, every problem looks like it's a schema declaration and a few XSLT transformations away from a nail"

This almost exactly describes a Project Manager I've worked with -- he has had experience with a CMS system that was entirely XML-document based, and sees almost every problem as solvable with XML/XSLT, and pushes it one his teams.

Which is a good example of why project managers shouldn't be allowed (or at least not depended on) to make technical architecture decisions. We've ended up doing a giant site search project based on huge XML indexes...

contextfree · on Feb 20, 2011

Five years ago I was a hardcore XML-hater (and an even more hardcore SOAP/WS-* hater), but these days I've sort of become an apologist for both. Not that XML has gotten any less ugly, I'm just annoyed at people mistaking superficial problems with deep problems, and start over every five years or so with a new syntax or something thinking it'll make those deep problems go away.

TeMPOraL · on Feb 20, 2011

In what way "computers adore XML"? Well, it can represent trees (and so can S-expressions), but its verboseness makes the files bigger, parsing more computation-intensive and memory consuming.

jister · on Feb 19, 2011

Technologies has its pros and cons so yeah, XML when used properly, doesn't kill babies.