bradfordcross: good overview, with one minor point. I wouldn't think of Pandora'...

bradfordcross · on Jan 2, 2010

This is the same point I am making. If they are still manually curating each song at 30 mins each, they could just stop, use the labels they already have, and infer the rest through semi-supervised learning, or learning the target labels based on the destructured tracks.

yannis · on Jan 2, 2010

It is a good approach and one I share. NLP has done it for years, once you have a corpus which is tagged is so much more easier to then work on your data. I also like your idea of using the Mechanical Turk to gain traction on the manual tagging, in any way that is probably what would super intelligent computers might do in the 40-year span - use humans to tag - before they super-massively carry out with the balance of the calculations! :)

One area which the article did not touch is how to introduce controls to identify 'rigging' of the system, ie, similarly to controlling link farms at Google. This is where the problem in my opinion is turned the other way out.