Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my experience, scraping Google data is hard. I did it years ago for a project. And I had to lease a huge block of private proxies. Each one only lasted a few minutes. But with a large enough block, they'd come back.


Google Web Search especially -- I've found that more than a query every 5-10 minutes will start throwing CAPTCHAs. For what I was doing, there was no other way to get the information I was looking for, so I just resigned myself to very slow crawls.

For Google+ itself, over a period of years, I'd hammered it with 100s to ~100k requests from residential IP space without ever throwing rate-limiting, at a rate of 2-20 queries/second or so, roughly.

We've started getting news of rate limiting over the past few months as archival activity has proceeded.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: