I wrote a very small API that scrapes the metropolitan museum's collections pages to get info - pull requests / issues / feature requests welcome. I had to rush rewrite it last weekend as they had completely redone the collections pages (and now export a bit via xml - try appending xml=1 to any object page).
This is my first foray into generators, and building an API that hopefully will be used more publicly, so any suggestions are welcome.
Also if you want to chat about what its like working for the Met, more specifically the media lab there, ask away!
Also, if you are looking for a giant dump of those images on a hard drive, get in contact with me as the site is not the easiest to crawl. I only have about 260,000 of the images, and its about 260gb.
http://github.com/jedahan/collections-api
http://scrapi.org
This is my first foray into generators, and building an API that hopefully will be used more publicly, so any suggestions are welcome.
Also if you want to chat about what its like working for the Met, more specifically the media lab there, ask away!
Also, if you are looking for a giant dump of those images on a hard drive, get in contact with me as the site is not the easiest to crawl. I only have about 260,000 of the images, and its about 260gb.