I'm fascinated with tagging, ontology and organizing information, and consider current methods badly flawed; so much so that I tried to make a startup out of my ideas (turned out to be too complex/ambitious).
My conclusion was that improving on existing systems is possible, but will require an awful lot of effort. The way forward, IMHO, is a kind of probabalistic ontology, based on mining lots of data combined with careful use of human intelligence.
Unfortunately that's as far as I got with the idea, though I hope someone else can make more progress than I did. Improving the ability to organize information seems to me to be a crucial problem that gets far too little attention, probably because it is a very hard problem. I will be surprised and disappointed if the best we can do 10 years from now is just search, categorisation and tagging.
The criticism is indeed fair, but I don't think it really sinks Shirky's key arguments. Shirky may have shortchanged professional ontologies and glossed over some details, but at the end of the day professional onotologies are expensive to create and maintain. With the explosion of content on the web, there's clearly a huge gap between free-text search and professional categorization.
The interesting part of the article for me (4 years later) is that it points to the promise of data mining tags, which we've only just begun to scratch the surface of.
I think that there's merit to his criticism of purely hierarchial classification, he just completely fails to acknowledge that to cataloging professionals this is old news.
(The "gee whiz, tags solve everything" hype has been sufficiently deflated elsewhere, IMHO.)
A few years back I was working on a blue-sky real estate database project. I spent a lot of time trying to decide between building the project with a triple store or an RDBMS.
The reason I mention this is that the parent notes that "professional onotologies are expensive to create and maintain" which is very true, but even worse, like Shirky's DSM example, you have to be an expert in the ontology to make good use of it!
To me, this was the critical argument against doing the project with a triple store, and ultimately, made me a pessimistic cynic of the semantic web. I can't put my finger on it, but it just feels way too complex to work in reality.
People don't like systems that manifest themselves as deterministic (not sure if that's the right word) but then act heuristically. However, people are fine with a heuristic system that acts that way. Ex: google.
The thing that strikes me the most is how dated this is already. Tagging -- the novel concept it introduces at great length -- won, and all the motivating arguments for it just seem obvious now.
Obviously, that's not a criticism of the piece itself, which is very good. It's just so (... quickly look up date ...) 2005.
I agree, but I think it has interesting parallels to problems that crop up in sufficiently complicated OO class hierarchy designs, and I'm not sure that discussion surrounding "design patterns" has fully acknowledged such issues.
People in some other niches (librarians, for one) have spent a lot of time thinking about these problems. They're generally struggling with managing a different kind of complexity than programmers are, but I'd love to see more cross-pollination of ideas.
That's true, a lot of the fundamental problems are the same. I suppose OO design patterns are workarounds for some of them.
Some time ago I read a collection of articles by Michael Jackson (the computing one) and it turns out he was making similar arguments against this kind of ontology in the 70s.
Incidentally, I always thought "ontology" meant "philosophy of being", not cataloguing systems.
To be fair, it wasn't novel at the time either (why is there no date on the article?). What I find interesting is that it does point out some of the unfulfilled promise of tag data-mining. We have all these tags now, but we're still very far from extracting their full value.
Definitely! I found the article doubly interesting because 1) I worked in a library through college* and remember finding such skews in organization rather odd, and 2) while he doesn't have any programming-specific examples, it also shows why trying to fit everything into a hierarchy of classes tends to involve a lot of awkward workarounds in any sufficiently complicated (i.e., real) system.
I don't have a CS degree (history, actually), but I wonder if, when cliche examples such as e.g. Animal -> Mammal -> Dog/Cat/etc. was presented ("See! You can fit things into a tree. OO simulates the real world!"), somebody had the foresight to ask how things like monotremes are represented. (Hopefully without being smug about it - it's an important question, far too important to just be a chance for That Guy to be a smart-ass.)
Library and information science professionals (e.g. catalogers) understand the enormity of the issue better than most programmers, in my experience. There's a lot of room for cross-pollination of ideas there.
thank you! to get posting rights on Y Combinator, you should have to answer a quiz, and there should be questions about this Shirky post on it. this and the one about micropayments. I can't believe some idiot got on the Daily Show and the front cover of Time without reading Shirky.
I'm fascinated with tagging, ontology and organizing information, and consider current methods badly flawed; so much so that I tried to make a startup out of my ideas (turned out to be too complex/ambitious).
My conclusion was that improving on existing systems is possible, but will require an awful lot of effort. The way forward, IMHO, is a kind of probabalistic ontology, based on mining lots of data combined with careful use of human intelligence.
Unfortunately that's as far as I got with the idea, though I hope someone else can make more progress than I did. Improving the ability to organize information seems to me to be a crucial problem that gets far too little attention, probably because it is a very hard problem. I will be surprised and disappointed if the best we can do 10 years from now is just search, categorisation and tagging.