XML: The Angle Bracket Tax

mtts · on May 12, 2008

XML isn't used for its readability, it's used because it's a good standard format for exchanging data between different pieces of software. I work in the web shop business and for us standardization on XML is a god send: We no longer do have to maintain (or write!) parsers for product data supplied in all sorts of different insane CSV-like formats. Instead we can simply throw a well designed off the shelf XML parser at the data and when a data file is corrupt, it saves us from having to spend hours on the phone with a partner company trying to figure out whether it's their data, their output software or our own parser that is broken. With XML it's either valid XML or it's not.

ajross · on May 12, 2008

Your point is that XML is good because it is standard. That is valid.

The OP link, however, was pointing out that while this is true, it's also true that XML is a very bad choice for that metadata format. Look at things like JSON or YAML for saner, simpler and more readable alternatives. We ended up with XML because it looked like HTML at a time where "web stuff" was all the rage, not because of its inherent value.

etal · on May 12, 2008

For some new work, I originally fought against using XML for storing data. Good ol' "key = value" would have done the trick, mostly, although as we kept adding data it might have been handy to impose more structure on it. Nonetheless, it was explained to me that the customer would have to be able to deal with these files, and we absolutely could not rely on them to touch the files with Notepad without screwing things up (CRLF vs LF, accidentally scrambling key names, etc.). What we can rely on, though, is XML:

1. Whipping up a web interface for editing the values in the files is trivial

2. If the customer or a less computer-savvy rep needs to tweak the files manually, we can tell them to use XML Notepad to minimize the risk of clobbering something.

I found that by using one layer of nodes and putting the data in attributes instead of their own nodes, the structure is roughly equivalent to INI or Apache config files, and not much more verbose. Maybe it's the Python in my blood speaking, but I appreciate having one obvious solution to a recurring, mundane problem -- organizing and storing small amounts of simple data in a way that anyone can retrieve it.

xirium · on May 12, 2008

> use XML Notepad

If the format gets unwieldly that you require special software then the benefit of it being a human-readable text format is lost.

etal · on May 12, 2008

Agreed, but in this case the readability of the text format is only a minor benefit.

I didn't want to go on a tangent, but the problem is that vanilla Windows XP (not sure about Vista) does not make it safe to play with plain text files, especially those meant for a Linux system. Notepad is not safe. So to protect the unaware, we need another layer between the user and the raw data to make . Any format would be fine, as long as there's a free tool available that the customer and their paranoid IT support would both be comfortable with, which preferably might already be installed, and Windows knows not to open it with Notepad by default. Since other products on the same network already use XML, the customer is OK with staring at this format, expects that .xml files will pop up read-only in Internet Explorer, and knows that the angled brackets mean serious business.

If any other format met all these needs at once, I'd jump for it -- but I haven't seen anything else that does. Any suggestions for other formats that work well on both Windows and Unix?

xirium · on May 12, 2008

From the comments: Seems to me that far too many people reach for the 'silver bullet' that is XML, then end up with a big pile of mess.

The same happened with OO. It has uses but it is generally overused.

marketer · on May 12, 2008

"Wouldn't it be nice to have easily readable, understandable data and configuration files"

No, not really

How many times do people actually read XML data, or configuration files? Once in a while I suppose, when performing some configuration tasks, or debugging some data operation. But 99% of the time, your programs are the ones reading the XML. And this is exactly what XML is designed for - being easy for machines to read.

XML is easy to parse, because you have hundreds of available XMl parsers. In many languages you don't even need XML parsers, they're included in the language (actionscript and scala come to mind). It's easy to query with xpath, it's easy to validate with xsd. Most databases can store it natively. There's a whole ecosystem of xml tools that help you work with the data. It would be hard to trade all that for something slightly more readable.

ajross · on May 12, 2008

Most of us would submit that people might read XML data more often if it didn't look like crap. If all we wanted was a metadata format that was easy for machines to read, we wouldn't have specified it in text.

If you're not reading your XML, you're probably using it wrong. And even if you personally don't care, we're still at the mercy of all the developer out there specifying their configuration formats in XML. If you're not reading your configuration files, you're definitely using them wrong.

jsmcgd · on May 12, 2008

Most people around here look at XML a lot, i.e. XHTML.

Tichy · on May 12, 2008

Obviously you are not a Java developer.

rplevy · on May 12, 2008

I read somewhere that Javascript was originally implemented in Lisp. Wouldn't it be so much better if they stuck with Lisp as the basis for extending Firefox (Netscape at the time)? Maybe in that alternate history it would have seemed obvious to use Lisp as a common data format instead of XML, it being the clearest and cleanest syntax for complex nested data/s-expressions. Easy to read and better than XML as a common format. As is, Javascript eventually matured into a pretty damn good language (though not as good as Common Lisp, not even as good as Emacs Lisp), but XML is pretty lousy.

demallien · on May 13, 2008

I used to use XML, until i finally got around to learning how to write parsers myself. Atthat point the lightbulb went on, and I realised that I could produce formats that are much easier for humans to read/edit, much smaller, and much quicker for the parser to analyse.

XML knows nothing about the structure of your data. This is intended, but it means that you have to describe the structure of the data along with the data. Seeing as the software that is going to use the XML has to know about the stucture anyway,if it is going to do anything useful, this just means that you end up specifying the structure in the software and in the XML. Blech!

Go and read up on what yacc and all of its decendants can do for you. It'll open up worlds for you :-)

mullr · on May 13, 2008

I've done both, and I regret the mini-parsers I wrote. It was for stupid stuff, like config files. The main problem is that nobody else could do anything with the code. XML is at least a common denominator. Not the lowest, but pretty low. :)

I've found it easiest to deal with xml in a very focused way. In my case, I always define a schema and use a code generator to get things in/out of memory. From there on, I don't care about the xml anymore. This is fast and reliable. The only real downsides are (a) ugly xml syntax, and (b) having to remember how XSD works every time.

Where I really object to XML is as an internal construct in a system. It's useful for putting things on disk, and it's useful for interchange with damn near anybody because of the ubiquitous libraries. But you don't want to be keeping your core application state in a DOM. Or using it as an inter-component interface. I've seen more than one person do that, and it made me very sad.

dpapathanasiou · on May 12, 2008

Before I read the article, I thought "angle bracket tax" would refer to how much bulkier (both in terms of storage & bandwidth) xml can be.

In some cases, the trade-off between larger data streams and semantic descriptiveness is not compelling.