Hacker News new | past | comments | ask | show | jobs | submit login

I use htmlunit for scraping, it's a headless browser although using WebKit will be far better.

It's sad that WebKit lacks some easier integration (good COM/.NET object in Windows).




Crowbar is a Gecko-based scraper, if you're interested in using a real browser: http://simile.mit.edu/wiki/Crowbar


The issue is using Crowbar in latest Gecko versions.


I've used htmlunit as well, basically because it seems to be the most mature. I find it chokes on some google maps js.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: