I wrote a very small API that scrapes the metropolitan museum's collections page... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

jedahan on May 22, 2014 | parent | context | favorite | on: New York's Metropolitan Museum of Art makes availa...

I wrote a very small API that scrapes the metropolitan museum's collections pages to get info - pull requests / issues / feature requests welcome. I had to rush rewrite it last weekend as they had completely redone the collections pages (and now export a bit via xml - try appending xml=1 to any object page).

http://github.com/jedahan/collections-api

http://scrapi.org

This is my first foray into generators, and building an API that hopefully will be used more publicly, so any suggestions are welcome.

Also if you want to chat about what its like working for the Met, more specifically the media lab there, ask away!

Also, if you are looking for a giant dump of those images on a hard drive, get in contact with me as the site is not the easiest to crawl. I only have about 260,000 of the images, and its about 260gb.

ars on May 22, 2014 | [–]

Upload them to wikipedia commons. See: https://commons.wikimedia.org/wiki/Commons:Guide_to_batch_up...

Better finish getting them all before the museum adds restrictions.

matt_morgan on May 22, 2014 | [–]

Is there any concern over there about the misleading rhetoric around this, i.e., that it just happened, not that it happened in 2011?

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact