In other recipe news, some devs @nytimes recently released some __VERY__ interesting code that handles unstructured data from the NYTIMES recipe archives:
On Tuesday the Guardian reported that the BBC (fondly aka Auntie) would be archiving their recipes, so I quickly scraped the site and wrote a search engine for it.
Not sure how the site will evolve, if at all, but it was a fun side project!
I thought archiving meant it would be no longer on their website, it would be awesome if they just released it all under something like the GPL. Could do some fun machine learning stuff with it...
I'd love them to release it as open data so that I'm not in murky waters.
If they move the recipes I can update the links. If they take the recipes down (which they've said they won't, now) I have got all the data so could rehost them.
edit: but, if you want to do some fun ML work, my scraper should help get you started!
CRF Ingredient Phrase Tagger https://github.com/NYTimes/ingredient-phrase-tagger
I used to work with a lot of recipe data in multiple languages, so this topic remains close to my heart.
This could be an interesting project to add to the trained data.