Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On one hand, it does make a lot of sense that many web publishers want to keep people from scraping content, given the way that it's often used nefariously, to violate copyright, or for spam purposes.

But there are totally legitimate reasons to scrape as well. Altmetric (https://www.altmetric.com), which is the company I work for, tracks links to scientific research. So when someone on e.g. Twitter links to a page on nature.com, we want to scrap the page they linked to and figure out which paper they are talking about (if any). Academic publishers can be particularly sensitive to scraping, making the endeavour much more work than it needs to be.

It's a real shame that the web has moved to be so closed off in many ways.



The web is not becoming closed off from users. It's becoming hostile to bots. Not the same.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: